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Introduction 


This  Idea  Award  Expansion  proposed  two  aims  to  capitalize  on  two  discoveries  made  under  the  originally 
funded  Idea  Award;  1)  identify  and  transcriptionally  profile  Fetal  Mammary  Stem  Cells  (fMaSCs),  and  2) 
uncover  of  molecul  similarities  between  fMaSCs  and  human  breast  cancers. 


Our  first  Aim  derived  refined  transcriptomic  profiles  from  cell  fractions  obtained  by  fluorescence  activated  cell 
sorting  (FACS).  We  used  both  publically  available  databases  of  the  adult  human  and  mouse  luminal, 
myoepithelial,  luminal  progenitor,  and  MaSC-enriched  fractions,  and  the  fetal  MaSC-enriched  and  stromal 
populations.  We  developed  a  meta-analysis  approach  to  derive  a  consensus  gene  signature  for  each  fraction 
by  using  data  from  all  published  human  studies  that  isolated  the  indicated  FACS  fraction.  This  approach 
reduced  signature  variability  generated  by  technical  and  biologic  variability.  Our  studies  revealed  similarities 
between  the  normal  human  cell  types  and  intrinsic  breast  cancer  subtypes.  Similar  analyses  performed  with 
the  fetal  and  adult  mouse  cell  fractions  enabled  generation  of  correlations  with  human  intrinsic  breast  cancer 
subtypes,  and  more  precise  assignment  of  realtionships  of  genetically  engineered  mouse  mammary  cancer 
models  to  both  normal  cell  types  and  to  the  human  intrinsic  subtypes.  An  important  conclusion  from  these 
studies  is  that  enrichment  for  the  human  luminal  progenitor  signature,  and  for  one  of  the  features  of  the  fMaSC 
signature,  predicts  pathologic  complete  response  to  neoadjuvant  anthracycline/taxane  based  chemotherapies 
across  all  human  cancer  subtypes.  This  correlation  pertains  even  after  controlling  for  intrinsic  subtype, 
proliferation,  and  clinical  variables.  On  the  other  hand,  another  feature  of  the  fMaSC  signature  predicts  for 
resistance  to  chemotherapy.  These  results  will  be  described  in  greater  detail  in  Dr.  Perou’s  Progress  Report, 
and  can  be  found  in  a  collaborative  publication:  Pfefferele,  Spike,  Wahl,  and  Perou  (2015)  Breast  Cancer  Res 
Treat  (pubished  online,  Jan.  10,  2015;  DOI  10.1007/si 0549-01 4-3262-6) 


Aim2  comprised  the  major  focus  of  the  Wahl  lab.  We  used  single  cell  RNA-sequencing  to  deconvolute  the 
fMaSC  population  into  its  component  cell  types.  Our  long  term  goal  is  to  identify  fetal  genes  and  pathways  that 
can  be  used  for  early  detection  of  triple  negative  breast  cancers,  to  elucidate  fetal  pathways  uniquely  used  by 
breast  cancers  exhibiting  enrichment  for  the  fMaSC  signature  as  the  basis  for  developing  targeted  therapeutic 
or  immunotherapeutic  strategies.  By  studying  single  cell  transcriptional  patterns  across  developmental  time,  we 
are  also  attempting  to  create  precise  signatures  for  the  pathways  that  enable  entry  into  and  exit  from  the 
fMaSC  state. 


Keywords 

Breast  Cancer  Prognosis,  Mammary  Stem  Cells,  Embryonic  Development,  Single  Cell  Transcriptomics 
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Overall  Project  Summary 

Statement  of  Work:  (W=Wahl  Lab;  P=Perou  Lab;  L=Lasken  Lab) 

Task  1.  Embryonic  mammary  stem  cell  signature  refinement  to  delineate  fMaSC  traits  in 
human  cancers  and  to  identify  new  targets  for  cancer  stem  cell  directed  therapies 

Task  1  is  mainly  the  work  of  the  Perou  lab.  Consequently,  Dr.  Perou  will  submit  under  separate  cover  the 
summary  of  his  lab’s  progress  to  complete  the  proposed  tasks.  I  note  that  much  of  the  data  reporting  this 
progress  is  published  in  Pfefferle  et  al,  2015  (see  references  below). 

la.  The  Perou  lab  will  obtain  the  gene  expression  raw  data  of  the  fMaSC  and  fStroma  samples 
previously  characterized  by  the  Wahl  lab.  First,  using  the  current  fMaSC  signature  of  600  genes,  a  score  for 
each  gene  will  be  assigned  based  on  its  differential  expression  across  the  two  groups.  Second,  using  a 
cross  validation  approach,  a  genomic  predictor  will  be  created  using  the  smallest  gene  list  possible  that  can 
correctly  discriminate  the  fMaSC  vs.  fStroma  samples.  Finally,  samples  used  for  the  identification  of  the 
minimum  gene  list  will  be  re-run  onto  the  Fluidigm  BioMark  platform,  and  the  optimal  classification 
ability  of  the  new  genomic  predictor  will  be  re-tested.  P,W  (months  1-2). 

lb.  The  latest  and  most  extensive  cell  line  microarray  database  of  the  Perou  Lab  will  be  used  for  this 
analysis.  This  data  set  includes  40  breast  cancer  cell  lines,  12  human  mammary  epithelial  and  fibroblast 
cell  lines  (primary  and  immortalized),  3  human  embryonic  stem  cell  lines  and  3  mesenchymal  stem  cell 
lines.  For  each  cell  line,  the  genomic  Euclidian  distance  to  the  fMaSC  and  fStromal  centroids  will  be 
calculated,  and  the  ratio  of  both  distances  will  be  the  final  "enrichment  score”.  P  (months  2-4) 

lc.  In  addition,  to  further  compare  the  levels  of  gene  expression  of  the  fMaSC-  and  fStromal-enriched 
populations  with  the  Perou  lab's  cell  lines,  -12  fMaSC/fStromal  samples  will  be  collected,  RNA  isolated, 
amplified  and  hybridized  onto  the  Perou  Lab  Whole  Genome  Custom  Array  Platform,  and  their  gene 
expression  profiles  compared  to  the  rest  of  cell  lines  using  supervised  and  unsupervised  hierarchical 
clusterings.  P,W  (months  3-8). 

ld.  The  association  of  the  MaSC  signature  with  pathological  complete  response  will  be  evaluated  across 
multiple  data  sets  with  annotated  clinical  data  and  where  gene  expression  microarrays  had  been 
performed  in  the  pre-treatment  samples.  Each  sample  will  be  assigned  an  enrichment  score  as  described 
above.  The  association  of  the  score  with  pathCR  will  be  evaluated  in  all  patients  and  also  within  each 
intrinsic  molecular  subtype  as  determined  by  the  PAM50-subtype  predictor.  For  those  data  sets  with  survival 
data,  association  with  survival  outcomes  will  also  be  evaluated  using  univariate  and  multivariate  Cox- 
model  analyses.  Finally,  the  enrichment  scores  during  and  after  chemotherapy  will  be  calculated  in  the 
samples  of  the  ISPY-1  trial  and  also  in  one  publicly  available  data  set  where  pre-  and  post-treatment 
samples  after  single  agent  docetaxel  or  endocrine  therapy  were  profiled.  P  (months  5-8). 

I  e.  The  association  of  the  MaSC  signature  with  the  development  of  distance  metastases  will  be  evaluated 
using  univariate  and  multivariate  analyses  in  >800  primary  tumor  samples  were  the  location  and  time  of  the 
first  site  of  distant  relapse  was  documented  (within  all  patients  and  also  each  of  the  intrinsic  subtypes). 
A  signature  enrichment  score  will  be  calculated  as  described  above.  A  similar  approach  will  be  done  for 
those  samples  of  the  UNC  database  where  match  primary  and  metastatic  lesions  were  profiled.  P  (months  7- 
10). 

lf.  The  genes  comprising  refined  fMaSC  and  fStromal  signatures  will  be  prioritized  according  to  their 
novelty  as  potential  therapeutic  targets  and  tractability  for  functional  testing.  Scientific/Clinical  literature  will 
be  surveyed  to  determine  novelty.  Scientific/Clinical  and  Company  literature  will  be  surveyed  to 
determine  the  availability  of  reagents  for  functional  testing.  Cell  line  expression  profiles  from  the  ATCC 
breast  cancer  collection  and  the  lines  referenced  in  task  la.  will  be  bioinformatically  evaluated  for  the 
presence  of  signatures  suggesting  relevance  of  fMaSC  and  fStromal  genes  in  the  prioritized  list  and  these 
cell  lines  will  be  selected  for  functional  analysis.  P,W  (months  2-1 0) 

lg.  Reagents  such  as  small  molecule  inhibitors,  receptor  specific  antibodies  and  gene  clones  or  inhibitory 
RNA  constructs  will  be  collected  for  the  top  candidates.  Activating  and  deactivating  genes  will  be 
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cloned  into  existing  inducible  lentiviral  vectors.  High  titer  lentivirus  will  be  produced  and  validated  for 
inducibility  and  RNA  inhibition  or  protein  production  will  be  validated  as  appropriate.  Other  targeted  reagents 
will  be  validated  using  standard  molecular  biological  approaches  where  necessary.  W  (Months  8-14) 

lh.  The  ability  of  viral  vectors  or  other  reagents  to  specifically  impact  their  molecular  targets  will  be 
examined  in  breast  cancer  cell  lines  through  2D  and  3D  culturing  systems  and  standard  molecular  biological 
approaches  (e.g.  western  blotting,  immunofluorescent/cytochemistry,  RT-PCR  etc.  examining  receptor 
phosphorylation,  protein  localization, .mRNA  abundance,  etc.).  Cellular  effects  on  proliferation,  survival  and 
migration  will  also  be  assayed.  W  (months  14-18) 

li.  Cell  lines  exhibiting  biological  responses  to  activation  or  inhibition  of  fMaSC  and  fStromal  pathways 
in  vitro  will  be  transduced  with  lentiviral  vectors  and  injected  as  xenografts  into  immune  compromised 
mice.  Tumor  growth  and  metastasis  will  be  evaluated  in  real  time  using  luminescent  imaging.  W  (months 
16-24) 

Task  2.  Embryonic  mammary  stem  cell  signature  refinement  using  RNA-seq  and  functional 
validation 


2a.  The  Wahl  lab  will  obtain  timed  pregnant  female  mice,  obtain  embryos  from  EI8.5,  dissect  mammary 
rudiments,  isolate  fMaSC  and  fStroma  by  methods  they  have  developed.  The  cells  will  be  flow  sorted  to 
obtain  fMaSC  enriched  (CD49fhighCD24highNCAM-)  and  fStromal  (CD49f+,  CD24-/+)  populations.  W 
(months  1-2) 

»>This  has  been  accomplished  (see  data  below). 

The  Lasken  lab  will  receive  fMaSC  enriched  and  fStromal  cells  from  the  Wahl  lab.  The  fMaSC  enriched 
cells  will  be  micromanipulated  to  obtain  single  cells,  which  will  then  be  lysed  to  preserve  RNA 
integrity  and  maximize  efficiency  for  generating  eDNA  for  SOLiD  sequencing.  L  (months  1-2). 

»»This  was  accomplished  (see  data  below). 

2b.  cDNA  will  be  generated  from  each  cell  and  amplified  by  a  PCR  method  under  conditions 
suitable  for  subsequent  SOLiD  DNA  sequencing  L  (months  2-3). 


2c.  Each  single  cell  cDNA  preparation  will  be  pre-screened  for  the  fMaSC  phenotype  based  on 
expression  of  KI4  and  K8  using  qRT-PCR  L,W  (months  2-3). 


2d.  The  highest  quality  eDNA  samples 
representing  10  additional  K14+K8+,  5 
K14+K8-  and  5  KI4-K8+  cells  will  be 
RNA  sequenced  by  the  SOLiD 
sequencing  method  at  a  level 
generating  about  40  million  sequence 
reads/cell.  L  (months  3-6). 

»»Tasks  2a-2d  were  initially  done 
manually  using  a  variety  of  methods. 
However,  approximately  18  months 
ago,  a  Fluidigm  Cl  single  cell 
microfluidic  instrument  capable  of 
lysing  96  single  cells  in  situ,  and  then 
preparing  cDNA  samples  from  them 
was  obtained  by  the  California 
Institute  for  Regenerative  Medicine 
(CIRM),  which  is  next  to  the  Salk.  We 
were  granted  approval  to  use  this 
instrument.  We  isolated  approximately 
100  individual  cells  from  el 8.5  in 
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Figure  1.  Genomic  sequence  alignments  of  RNA  derived  sequences  from  several 
individual  fMaSC  cells  and  control  samples.  The  alignments  show  high  concordance 
with  annotated  exon  structures  (Gene  Models).  The  data  also  show  high  technical 
reproducibility  between  replicate  sequencing  experiments  for  each  sample. 
Controls  are  comprised  of  pools  of  fMaSC  cells  processed  using  the  same 
biochemistry  as  the  Cl  protocol  (fMaSC  Smart-seq/Nextera)  or  lacking  reverse 
transcriptase  (Negative  control).  An  fMaSC  pool  and  a  Stromal  pool  (fStr)  processed 
by  an  alternative  approach  (Tru-Seq)  that  works  on  bulk  samples  are  also  provided 
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several  independent  experiments  and  loaded  them  onto  the  Cl  to  obtain  cDNA.  An  example  of  the  data 
obtained  is  shown  in  Figure  1.  Of  note,  this  instrument  enables  one  to  evaluate  the  cells  microscopically 
using  a  live-dead  cell  dye  to  restrict  all  RNA-sequencing  data  to  live  cells.  As  can  be  seen,  we  detected 
cells  co-expressing  K14  and  K8  RNA  (Figure  1,  Cells  #4-6),  and  the  RNA  seq  data  follow  the  gene 
models  very  well. 

2e.  RNA-Seq  data  will  be  analyzed  to  discover  additional  genes  and  gene  clusters  associated  with  the 
fMaSC  cells.  These  data  will  be  combined  with  the  analysis  of  10  KI4+K8+  cells  that  are  currently  be 
sequenced  with  funding  from  the  JCVI  and  a  Salk  Cancer  Center  Starter  award  to  yield  a  total  of  20 
KI4+K8+  cell  analyzed.  The  sequence  and  data  analyses  will  be  conducted  jointly  by  the  Lasken  and 
Wahl  labs.  L,  W  (months  7-13). 


2f.  A  list  of  markers  will  be  generated  through  bioinformatics  analysis  of  single  cell  RNA-Seq  data  to 
identify  markers  associated  with  distinct  cell  types.  The  literature  will  be  surveyed  for  the  availability  of 
reagents  for  the  prospective  isolation  of  the  distinct  cell  types  using  the  identified  markers.  Reagents  will 
be  acquired.  Cells  will  be  isolated  based  on  these  markers  and  the  fidelity  of  separation  of  the  individual 
cell  types  and  per  based  analysis  resorting  will  be  carried  out  to  assess  purity  of  sorting.  P,  L,  W 
(months  12-15) 


B 


Fetal  Mammary 
Stem  Cell-like 
Cells 

Basal-like 

Cells 

Luminal-like 

Cells 


»»Our  initial  evaluation  of  the  gene  expression  profiles  within  the  fMaSC  population  at  single  cell 
resolution  showed  the  cells  to  be  heterogeneous  with  no  clear  subpopulation  likely  to  correspond  to  a 
distinct  stem  cell  subpopulation.  However,  through  the  use  of  the  Cl  instrument,  we  were  able  to  broaden 
our  research  approach  to  include  additional  developmental  states  that  could  be  used  to  delineate  gene 
expression  changes  that  define  the  gain  and  loss  of  the  stem  cell  phenotype  over  the  course  of 
development.  That  is,  instead 
of  focusing  just  on  El 8  cells, 
we  decided  to  obtain  cells  from 
throughout  development  so 
that  we  could  have  a  data  set 
that  would  position  us  to 
identify  the  pathways  that  are 
altered  in  going  from  the  pre¬ 
stem  cell  state  at  el  6,  to  the 
stem  cell  state  at  el 8,  and  then 
into  the  differentiated 
myoepithelial  and  luminal 
lineages  associated  with  adult 
mammary  development.  We 
have  now  sequenced  hundreds 
of  cells  from  El 8,  P0,  P4  and 
adult  mice,  and  have  clustered 
the  data  using  multiple 
bioinformatic  methods  to  assign 
cellular  phenotypes.  As  one 
example,  we  used  the  Monocle 
strategy  to  infer  lineages  based 
on  generation  of  minimum 
spanning  trees  of  transcriptional 
relatedness  (Figure  2  A,B).  We 
then  derived  an  independent 
approach  that  does  not  use  the 
Monocle  assumption  of  direct 
lineage  relationships  between 
cells  of  close  transcriptional 
relatedness,  which  resulted  in  a 
very  similar  outcome  (Figure  2 
C-E) 

These  methods  proved  robust 


Figure  2.  Unsupervised  identification  of  cell  types  and  candidate  regulatory  genes 
from  single  cell  data.  A)  Monocle  plot  of  single  cell  relatedness  (i.e.  proximity  in  the 
graph),  minimum  spanning  tree  model  of  differentiation  (‘pseudotime’)  and  identification 
of  three  putative  cell  types  (Luminal-like,  Basal-like  and  fMaSC-like).  B)  Expression 
levels  of  three  highly  fMaSC  associated  genes  in  single  cells  that  have  been  reorganized 
according  to  their  position  along  the  pseudotime,  minimum  spanning  tree.  Note:  The 
majority  of  adult  cells  are  plotted  on  the  x-axis  for  these  three  genes  as  they  are  rarely 
expressed  in  adult  cells  (i.e.  y=0).  C-D)  An  alternative  approach  yields  similar  but  more 
refined  results.  C)  Using  our  alternative  approach  cells  are  found  to  be  distributed  along 
a  continuum  from  El  8  to  adult  (vertical  axis).  Clustering  of  these  cells  according  to  gene 
expression  ranks,  identifies  known  Luminal  and  Basal  adult  cell  types,  a  novel  adult  cell 
type  and  several  cell  states  along  the  continuum  from  the  most  primitive  cells  to  the 
adult.  D)  Fine  scale  analysis  of  the  most  primitive  cells  identifies  genes  associated  with 
the  earliest  differentiation  events  as  bi-phenotypic  cells  become  more  luminal  (green), 
more  basal  (pink)  or  more  niche  related  (blue). 
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for  separating  cells  according  to  the  developmental  time  at  which  they  were  obtained,  and  to  the  lineages  to 
which  they  are  most  related.  Further  analysis  based  on  these  approaches  has  enabled  us  to  define  a  subset 
of  cells  from  El  8  that  appear  to  be  uncommitted  to  either  the  luminal  or  myoepithelial  lineages,  and  yet  to 
express  genes  indicative  of  both.  For  example,  these  cells  express  both  luminal  and  myoepithelial  cytokeratins, 
as  well  as  lineage  specification  genes  including  SoxlO,  GATA3,  and  Elf5.  The  methods  and  examples  of  data 
obtained  from  these  analyses  were  presented  in  the  Progress  Report  submitted  last  year.  A  manuscript 
describing  these  studies  is  currently  being  prepared  and  is  only  awaiting  data  obtained  from  cells  in  the  “pre-“ 
stem  cell  state  at  El 5-1 6. 

We  have  experienced  significant  technical  problems  using  the  Cl  instrument  to  analyze  cells  derived  from 
El  5-1 6.  Recently,  we  have  returned  to  working  with  the  Lasken  lab  to  sort  cells  from  El  5-1 6  cells  directly  into 
wells  of  a  384  well  plate,  and  then  used  the  SMART-seq  2  protocol  to  prepare  cDNA  libraries.  Sequencing  of 
these  libraries  is  currently  in  progress. 

2g.  Cells  will  be  sorted  using  population  specific  markers.  In  vitro  colony  growth,  serial  replating  ability  and 
immunofluorescent  analysis  ofbipotent  progeny  will  be  evaluated  for  each  candidate  marker.  W  (month 
13-18) 

2h.  Markers  yielding  stem  cell  phenotypes  in  vitro  will  be  used  to  sort  fetal  mammary  cells.  Cells  will  be 
transplanted  at  limiting  dilution  to  reconstitute  murine  mammary  glands  additionally  single  cells  will  be 
transplanted  to  reconstitute  murine  mammary  glands.  W  (months  18-24) 

2i.  Additional  RNA-seq  will  be  performed  to  refine  the  data  obtained  from  validated  fMaSCs.  A  second 
SOLiD  sequencing  run  will  be  carried  out  on  eDNA  from  ten  single  cells  to  refine  and  test  conclusions 
obtained  in  the  first  year  of  the  grant.  L,W  (Months  14-18). 

»»We  have  accomplished  the  major  goals  of  Aims  2g-i,  and  provide  the  following  as  one  important 
example  of  the  value  and  relevance  of  the  data  obtained.  The  results,  summarized  briefly  below,  identified 
the  cell  state  regulator  SOXIO  as  a  developmental  control  gene  able  to  identify  and  highly  enrich  for  fMaSCs 
that  is  also  required  for  the  fMaSC  state  (Dravis  et  al  (2015),  Cell  Reports,  v.  12,  pgs  2035-2048). 

Our  expression  profiling  identified  SOXIO  as  one  of  the  most  differentially  regulated  genes  in  fMaSCs 
using  both  microarrays  and  single  cell  RNA  sequencing  (Dravis  et  al,  Figure  1A,  pg  2036).  We  obtained  a 
mouse  expressing  an  H2B-Venus  transgene  under  the  control  of  the  SOXIO  endogenous  promoter.  The 
mammary  epithelial  cells  in  the  embryonic  rudiments  were  brilliantly  labeled,  while  there  was  little  if  any 
staining  in  the  surrounding  stroma  (Dravis  et  al,  Figure  2A,  pg.  2038).  We  used  FACS  to  obtain  various  cell 
fractions  on  the  basis  of  their  expression  of  different  levels  of  EpCAM  or  SOX1 0  (i.e.,  venus  fluorescence).  We 
found  that  only  those  cells  that  were  EpCAM+  and  SOX1 0-high  exhibited  all  of  the  properties  expected  of 
fMaSCs:  generation  of  polarized  organoids  in  vitro,  ability  to  generate  full  functional  mammary  outgrowths  from 
limited  numbers  of  cells  transplanted  into  de-epithelialized  fat  pads,  and  ability  to  “self-renew”  as  assayed  by 
multiple  rounds  of  transplantation,  or  dissociation  and  re-formation  of  spheres  in  vitro  (Dravis  et  al,  Figure  2E, 
F,  G,  H,  I,  pg  2038).  We  also  obtained  mice  with  floxed  SOXIO  genes,  deleted  the  SOXIO  genes  from 
fMaSCs  in  vitro,  and  found  that  fMaSCs  lacking  SOXIO  expression  no  longer  formed  organoids  in  vitro  or 
transplanted  in  vivo  (Dravis  et  al,  Figure  4A-E,  pg.  2041).  Finally  we  showed  that  over-expressing  SOXIO  led 
to  two  very  important  phenotypes.  First,  after  short  periods  of  expression,  we  found  the  fMaSCs  could  form 
secondary  organoids  with  much  higher  efficiency  than  if  they  did  not  express  high  SOXIO  levels.  Second,  if 
SOXIO  expression  was  maintained  at  high  levels,  the  fMaSCs  lost  expression  of  epithelial  markers,  no  longer 
expressed  luminal  or  basal  cytokeratins,  gained  expression  of  vimentin,  and  became  motile  but  non¬ 
proliferative  (Dravis  et  al,  Figure  5A-F,  pg.  2042).  In  other  words,  they  acquired  many  characteristics  of  cells 
that  had  undergone  an  epithelial-mesenchymal  transition.  Importantly,  reducing  SOXIO  expression  in  the  cells 
that  had  moved  away  from  the  organoids  to  set  up  solitary  “satellites”  resulted  in  reversion  of  the  cells  to  an 
epithelial  state,  re-entry  into  the  cell  cycle,  and  restoration  of  their  ability  to  generate  both  luminal  and 
myoepithelial  descendants.  In  other  words,  the  stem  state  was  readily  reversed  depending  on  SOXIO  levels. 
We  have  begun  to  search  for  in  vivo  conditions  that  regulate  SOXIO  in  the  mammary  gland  and  that  could  be 
relevant  to  fMaSC  genesis  and  breast  cancer  biology.  We  found  that  FGF10  specifically  induces  SOXIO 
transcription,  and  that  either  leaving  SOXIO  out  of  the  culture  medium,  or  using  an  pan-FGF  receptor  inhibitor, 
prevents  SOXIO  transcriptional  activation,  and  prevents  fMaSCs  from  undergoing  an  EMT  (Dravis  et  al,  2015, 
Figures  1B-E,  pg  2036).  Interestingly,  FGF10  is  one  of  the  factors  produced  during  wound  healing.  We 
speculate  that  as  wound  signatures  have  been  correlated  with  initiation  and  progression  of  breast  cancer,  that 
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exposure  of  the  fMaSC-like  cells  we  have  documented  to  be  present  in  basal-like  breast  cancers  may  enable 
them  to  acquire  motility,  depart  the  local  tumor  environment,  and  metastasize  to  distant  sites  at  which,  if 
exposed  to  a  lower  FGF  environment,  then  may  reverse  their  phenotype,  become  more  stem-like,  and  produce 
a  heterogeneous  cellular  mass  at  an  ectopic  location. 

2j.  Identification  of  gene  signatures  corresponding  to  fMaSC  from  bioinformatic  analysis  (task  2e,f  )  and 
bioinformatic  refinement/reduction  of  the  signature.  Selection  of  candidate  markers  for  analysis  of  fMaSC 
contribution  to  archival  tumor  samples  and  tissue  analysis  P,L,W  (months  12-24) 

»»We  found  that  elevated  SoxlO  expression  is  found  in  Basal-like  and  some  Claudin-low  human 
breast  cancers  (Dravis  et  al,  2015,  Figure  IF,  pg  2036). 

2k.  Immuno-histochemical  and  in  situ  analysis  of  archival  tumor  tissue.  P,W  (months 
12-24). 

»»We  are  now  developing  the  collaborations  we  need  to  obtain  relevant  samples 
from  UCSD,  and  we  continue  to  work  with  Dr.  Perou  to  analyze  his  human  and  mouse 
tissue  samples  as  we  derive  additional  informative  signatures.  Unfortunately,  we  have 
found  no  SoxlO  antibodies  suitable  for  IHC  or  IF  analyses. 

Key  Research  Accomplishments: 

Aim  1 

1.  Development  of  a  meta-analysis  approach  to  derive  more  precise  signatures  for  normal  mammary  cell 
luminal,  progenitor,  myoepithelial,  and  stem  cell  populations  from  human  and  mouse  systems.  This 
method  proved  more  robust  than  using  single  studies  for  analysis,  and  sets  a  precedent  for  use  of  such 
meta-analysis-derived  signatures  in  future  studies. 

2.  Application  of  refined  signatures  based  on  normal  mammary  cell  types  to  analysis  of  human  breast 
cancers  and  mouse  cancer  models  to  determine  which  normal  cell  types  correspond  most  closely  to 
cancers  in  each  species. 

3.  Use  of  single  sample  classifiers  revealed  diversity  of  cellular  relationships  among  each  GEMM  and 
human  breast  cancer  intrinsic  subtype. 

4.  Demonstration  that  the  human  luminal  progenitor  and  one  feature  of  the  mouse  fMaSC  signature 
correlate  with  pCR  across  all  human  breast  cancer  subtypes,  and  retains  significance  in  multi-variable 
analyses  including  proliferation,  subtype,  and  clinical  parameters.  Importantly,  one  feature  of  the 
fMaSC  profile  associated  with  luminal  attributes  predicted  for  poor  response  to  anthracycline/taxane 
based  chemotherapy  for  patients  whose  tumors  display  enrichment  for  this  profile. 

Aim  2 

1.  Obtained  transcriptomes  from  hundreds  of  individual  cells  across  four  developmental  time  points 
critical  for  understanding  mechanisms  of  acquisition  and  loss  of  the  stem  cell  state  during  mouse 
mammary  development. 

2.  Use  of  transcriptomic  data  to  identify  candidate  transcriptional  regulators  relevant  to  acquisition  of 
mammary  sternness.  Identification  of  SOX1 0  as  one  such  gene. 

3.  Demonstrated  fetal  mammary  cells  expressing  SOXIO  uniquely  identify  the  stem  cell  population.  This 
discovery  facilitated  purification  of  the  most  pure  fMaSC  population  to  date,  which  enabled  obtaining 
more  precise  transcriptomic  data. 

4.  Genetic  strategies  were  employed  to  show  that  SOXIO  is  required  for  fMaSC  function  in  vitro  and  in 
vivo. 

5.  Developed  a  genetic  system  to  enable  analysis  of  the  effects  of  SOX1 0  overexpression.  These  studies 
showed  that  persistent  SOXIO  expression  preserves  fMaSC  multipotentiality,  but  long  term  high 
SOXIO  expression  causes  fMaSCs  to  undergo  a  mesenchymal  transistion  that  does  not  correlate  with 
elevated  levels  of  Slug,  Snail,  Zebl,  or  Twist  as  reported  for  other  systems.  The  mesenchymal 
transition  was  reversed  upon  reducing  SOXIO  levels. 

6.  Gene  expression  and  functional  studies  revealed  a  positive  feedback  loop  between  FGF  signaling  and 
SOXIO.  Elevated  SOXIO  led  to  upregulation  of  potentiators  of  FGF  signaling,  and  down  regulation  of 
FGF  signaling  antagonists. 
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Conclusion 


Our  new  data  are  consistent  with  our  previous  studies  showing  that  fMaSC  signatures  contain  unique 
combinations  of  expressed  genes  with  relevance  to  human  breast  cancer  biology,  including  the  response  of 
breast  cancers  of  all  intrinsic  subtypes  to  chemotherapy.  We  have  thus  developed  a  potentially  useful  metric 
for  clinical  decision-making.  We  continue  to  improve  methods  for  doing  single  cell  RNA-seq,  and  for 
bioinformatically  analyzing  the  results.  These  studies  revealed  the  potential  relevance  of  SOXIO  to  fMaSC 
biology,  which  we  established  using  a  combination  of  in  vitro  and  in  vivo  approaches. 
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Abstract  Mammary  gland  morphology  and  physiology 
are  supported  by  an  underlying  cellular  differentiation 
hierarchy.  Molecular  features  associated  with  particular 
cell  types  along  this  hierarchy  may  contribute  to  the  bio¬ 
logical  and  clinical  heterogeneity  observed  in  human  breast 
carcinomas.  Investigating  the  normal  cellular  develop¬ 
mental  phenotypes  in  breast  tumors  may  provide  new 
prognostic  paradigms,  identify  new  targetable  pathways, 
and  explain  breast  cancer  subtype  etiology.  We  used 
transcriptomic  profiles  coming  from  fluorescence- activated 
cell  sorted  (FACS)  normal  mammary  epithelial  cell  types 
from  several  independent  human  and  murine  studies.  Using 
a  meta-analysis  approach,  we  derived  consensus  gene 
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signatures  for  both  species  and  used  these  to  relate  tumors 
to  normal  mammary  epithelial  cell  phenotypes.  We  then 
compiled  a  dataset  of  breast  cancer  patients  treated  with 
neoadjuvant  anthracy cline  and  taxane  chemotherapy  regi¬ 
mens  to  determine  if  normal  cellular  traits  predict  the 
likelihood  of  a  pathological  complete  response  (pCR)  in  a 
multivariate  logistic  regression  analysis  with  clinical 
markers  and  genomic  features  such  as  cell  proliferation. 
Most  human  and  murine  tumor  subtypes  shared  some,  but 
not  all,  features  with  a  specific  FACS-purified  normal  cell 
type;  thus  for  most  tumors  a  potential  distinct  cell  type  of 
‘origin’  could  be  assigned.  We  found  that  both  human 
luminal  progenitor  and  mouse  fetal  mammary  stem  cell 
features  predicted  pCR  sensitivity  across  all  breast  cancer 
patients  even  after  controlling  for  intrinsic  subtype,  pro¬ 
liferation,  and  clinical  variables.  This  work  identifies  new 
clinically  relevant  gene  signatures  and  highlights  the  value 
of  a  developmental  biology  perspective  for  uncovering 
relationships  between  tumor  subtypes  and  their  potential 
normal  cellular  counterparts. 

Keywords  Breast  cancer  •  Comparative  genomics  • 
Genetically  engineered  mouse  models  • 

Genomic  signatures  •  Neoadjuvant  chemotherapy  • 

Normal  mammary  tissue 

Introduction 

The  mammalian  breast  is  a  dynamic  organ,  with  major  mor¬ 
phological  changes  occurring  during  organogenesis,  puberty, 
pregnancy,  lactation,  and  involution  [1].  Underlying  these 
mammary  gland  changes  is  a  complex  cell  hierarchy  that 
supports  these  processes  [2-4].  The  simplest  model  places  the 
multipotent  mammary  stem  cell  (MaSC)  at  the  base  of  this 
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hierarchy,  having  extensive,  self-regenerative  potential  [5]. 
During  mammary  development,  the  MaSC  has  been  proposed 
to  divide  asymmetrically  to  produce  basal/myoepithelial  cells 
as  well  as  luminal  progenitors  (LumProg),  which  have  more 
restricted  proliferative  and  differentiation  capabilities  [5]. 
LumProg  cells  are  capable  of  further  differentiation  into 
mature  luminal  (MatureLum)  cells,  such  as  estrogen  receptor 
(ER)-positive  ductal  epithelium,  which  have  an  even  more 
limited  proliferative  potential  and  some  of  which  are  termi¬ 
nally  differentiated  [5]. 

Breast  tumors  may  originate  from  several,  if  not  all,  of 
the  cell  types  within  this  complex  mammary  hierarchy. 
These  various  cellular  foundations  for  tumor  initiation  may 
help  explain  the  heterogeneous  nature  of  human  breast 
tumors  [6],  which  consist  of  multiple  histological  and 
genomic  subtypes;  these  genomic  groups,  which  are 
defined  by  their  gene  expression  profiles,  have  become 
known  as  the  intrinsic  subtypes  of  breast  cancer  and  are 
referred  to  as  basal-like,  claudin-low,  HER2-enriched, 
luminal  A,  and  luminal  B  [7-10].  A  simple  etiological 
explanation  for  these  different  subtypes  involves  a  one-to- 
one  relationship  between  each  intrinsic  subtype  and  a  dis¬ 
tinct  cell  type  of  origin  that  largely  maintains  its  pheno¬ 
typic  identity  after  oncogenic  transformation;  however, 
both  normal  and  neoplastic  non-stem  cells  can  acquire 
stem-like  properties,  suggesting  that  the  normal  cell  hier¬ 
archy  model  could  also  include  an  element  of  reversibility 
[11].  This  also  raises  the  possibility  that  molecular  features 
defining  tumor  subtypes,  may  be  acquired  during  tumori- 
genesis  [12]. 

Genetically  engineered  mouse  models  (GEMMs)  of 
breast  carcinoma  develop  heterogeneous  tumors  [13,  14], 
but  the  extent  to  which  they  represent  human  disease  is 
an  area  of  active  investigation.  We  previously  showed 
that  murine  mammary  tumors  comprise  at  least  17  dis¬ 
tinct  intrinsic  subtypes/classes,  with  eight  classes  being 
identified  as  strong  human  subtype  counterparts  by  gene 
expression  similarity  [14].  As  with  human  breast  cancer, 
the  degree  to  which  murine  models  reflect  normal 
mammary  epithelial  subpopulations  requires  further  ana¬ 
lysis.  Characterization  of  the  cellular  features  of  these 
murine  classes  is  also  needed  to  better  determine  their 
preclinical  utility,  to  shed  light  on  trans- species  associa¬ 
tions  [14],  and  to  help  interpret  preclinical  study  obser¬ 
vations  [15-18]. 

Several  studies  have  independently  profiled  fluores¬ 
cence-activated  cell  sorted  (FACS)  purified  normal  mam¬ 
mary  cell  types  from  both  human  [19-21]  and  murine  [22, 
23]  mammary  tissues.  Here,  we  use  a  meta- analysis 
approach  to  compare  the  transcriptomic  profiles  from 
FACS-enriched  mammary  cell  populations  with  each  other 
and  with  primary  tumors.  These  data  not  only  identify  a 
number  of  clinically  relevant  biomarkers  that  may  be 


useful  for  predicting  chemotherapy  benefit,  but  also  sug¬ 
gest  a  cell  type  of  origin  for  many  tumor  subtypes. 

Methods 

Detailed  methods  can  be  found  in  Supplemental  File  1. 

Mammary  cell  subpopulation  gene  signatures 

Gene  expression  measurements  from  FACS-enriched 
mammary  subpopulations  were  obtained  from  three  human 
and  two  murine  published  studies:  GSE 16997  [19], 
GSE19446  [22],  GSE27027  [23],  GSE35399  [20],  and 
GSE50470  [21].  Using  a  meta-analysis  approach,  a  con¬ 
sensus  ‘enriched’  gene  signature  was  produced  for  each 
mammary  subpopulation.  ‘Enriched’  signatures  comprised 
genes  that  were  identified  as  being  uniquely  and  highly 
expressed  (false  discovery  rate  (FDR)  <  5  %)  within  a 
given  subpopulation  as  determined  using  a  two-class 
(subpopulation  X  versus  all  others)  significance  analysis  of 
microarrays  (SAM)  analysis  [9,  24].  Each  ‘enriched’  sig¬ 
nature  was  further  refined  by  supervised  clustering  using 
the  human  UNC308  breast  tumor  dataset  [9]  to  identify 
subpopulation  ‘features’,  which  were  defined  as  having  at 
least  ten  genes  with  a  Pearson  correlation  greater  than  0.5 
across  all  tumors  [15,  25].  Expression  scores  for  gene 
signatures  were  determined  by  calculating  the  mean 
expression  of  the  signature  within  each  tumor;  all  gene 
signature  lists  are  provided  in  Supplemental  Table  1. 

Mammary  cell  subpopulation  centroids 

Mammary  cell  subpopulation  centroids  were  created  using 
the  union  of  the  ‘enriched’  epithelial  gene  signatures. 
Distance  weighted  discrimination  (DWD)  single  sample 
predictor  [26]  was  used  to  calculate  the  shortest  Euclidean 
distance  between  each  tumor  and  each  epithelial  cell- 
enriched  centroid.  Samples  with  a  positive  silhouette  width 
were  considered  to  have  a  strong  association  with  a  given 
subpopulation  [27]. 

Chemotherapy  response 

A  combined  breast  cancer  gene  expression  dataset  of 
patients  treated  with  neoadjuvant  anthracy cline  and  taxane 
chemotherapy  regimens  was  created  from  three  public 
datasets:  GSE25066  [28],  GSE32646  [29],  and  GSE41998 
[30].  Univariate  (UVA)  and  multivariate  (MV A)  logistic 
regression  analyses  were  used  to  determine  if  gene  signa¬ 
tures  derived  from  normal  cell  populations  were  capable  of 
predicting  pathological  complete  response  (pCR). 
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Results 

Comparison  of  human  mammary  subpopulation 
transcriptomic  datasets 

Several  groups  have  independently  obtained  transcriptomic 
profiles  of  normal  human  breast  cells  and  compared  the 
genomic  biology  of  these  different  cell  types  with  human 
tumors  [19-21].  In  these  studies,  normal  mammary  tissues 
obtained  from  female  donors  were  FAC  sorted  using  cell 
surface  markers  to  enrich  for  specific  mammary  subpopu¬ 
lations  before  microarray  analysis  (Table  1;  Fig.  1).  While 
these  initial  studies  were  important,  the  datasets  themselves 
were  relatively  small  (n  =  12  for  Lim  et  al.  [19],  n  =  72 
for  Shehata  et  al.  [20],  n  =  18  for  Prat  et  al.  [21]),  and  few 
if  any  comparisons  across  studies  were  performed. 
Importantly,  FACS-based  cell  fractionation  can  only  enrich 
for  specific  subpopulations.  Therefore,  transcriptomic 
profiles  reflect  features  of  other  contaminating  cell  types  to 
varying  degrees.  As  such,  study- specific  biases  may  be 
present  in  any  single  dataset;  therefore,  we  used  consensus 
information  from  all  three  FACS-enriched  human  tran¬ 
scriptomic  datasets  to  reduce  technical  and  study- specific 
biases. 

Following  DWD  normalization  [26],  an  unsupervised 
cluster  of  the  most  variably  expressed  genes  was  performed 
using  Gene  Cluster  v3.0  by  selecting  all  genes  with  an 
absolute  log2  expression  value  greater  than  three  in  at  least 
four  samples  (212  genes)  (Fig.  2a).  In  general,  the  four 
major  array  dendrogram  nodes  correspond  to  the  four 
FACS-enriched  mammary  subpopulations,  indicating  that 
the  most  highly  and  variably  expressed  genes  are  similarly 
expressed  across  the  different  studies.  Even  when  using  all 
genes  in  the  dataset,  there  is  a  high  Pearson  correlation 


within  a  given  subpopulation  across  studies  and  low  cor¬ 
relations  to  other  subpopulations  (Fig.  2b). 

On  a  per-sample  basis,  the  first  principle  component 
separated  the  stroma  and  adult  mammary  stem  cell  (aMaSC) 
samples  from  the  LumProg  and  MatureLum  samples 
(Fig.  2c).  The  second  principle  component  separated  the 
stroma  and  aMaSC  samples  into  distinct  groups,  while  the 
third  principle  component  separated  the  LumProg  and  Ma¬ 
tureLum  samples  into  distinct  groups.  The  aMaSC  subpop¬ 
ulation  displayed  the  highest  level  of  variation,  which  is 
likely  attributable  to  varying  degrees  of  contamination  by 
other  cell  types. 

Human  mammary  cell  subpopulation  enriched  gene 
signatures 

As  shown  in  Fig.  2,  there  is  a  natural  degree  of  variation 
between  samples  of  a  given  subpopulation.  We  therefore 
developed  gene  signatures  for  each  human  mammary 
subpopulation  by  integrating  consensus  information  across 
all  three  datasets  (Table  1)  to  identify  the  highest  confi¬ 
dence  subpopulation- specific  genes.  First,  genes  highly 
expressed  (FDR  <  5  %)  within  each  mammary  subpopu¬ 
lation  were  found  using  a  two-class  (subpopula¬ 
tion  X  versus  all  others)  SAM  analysis  [24]  within  each 
dataset  [19-21].  Second,  the  overlap  of  genes  highly 
expressed  within  a  particular  subpopulation  across  studies 
was  determined.  Lastly,  as  it  is  possible  in  the  above 
analysis  to  have  the  same  gene  in  the  signature  of  more 
than  one  subpopulation,  genes  that  were  identified  to  be 
significantly  associated  with  more  than  one  subpopulation 
were  also  removed.  This  resulted  in  a  single,  consensus 
Homo  sapiens-e nriched  (HsEnriched)  signature  per  sub¬ 
population  (Fig.  3a).  The  average  Euclidean  distance  was 


Table  1  Human  FACS-enriched  normal  mammary  cell  subpopulation  studies 


Enriched  population 

FACS  markers 

Species 

Source 

Abbreviation 

Reference 

Stroma 

CD49fneg,  EpCAMneg 

Human 

Adult 

aStr-Lim09 

Lim  et  al.  [19] 

CD49fneg,  EpCAMneg 

Human 

Adult 

aStr- Shehata 

Shehata  et  al.  [20] 

CD49fneg,  EpCAMneg 

Human 

Adult 

aStr-Prat 

Prat  et  al.  [21] 

Stem  cell 

CD49fpos,  EpCAMneg 

Human 

Adult 

aMaSC-Lim09 

Lim  et  al.  [19] 

CD49fpos,  EpCAMneg 

Human 

Adult 

aMaSC-Shehata 

Shehata  et  al.  [20] 

CD49fpos,  EpCAMneg 

Human 

Adult 

aMaSC-Prat 

Prat  et  al.  [21] 

Luminal  progenitor 

CD49fpos,  EpCAMpos 

Human 

Adult 

LumProg-Lim09 

Lim  et  al.  [19] 

CD49fpos,  EpCAMpos 

Human 

Adult 

LumProg- Shehata 

Shehata  et  al.  [20] 

CD49fpos,  EpCAMpos 

Human 

Adult 

LumProg-Prat 

Prat  et  al.  [21] 

Mature  luminal 

CD49fneg,  EpCAMpos 

Human 

Adult 

MatureLum-Lim09 

Lim  et  al.  [19] 

CD49fneg,  EpCAMpos 

Human 

Adult 

MatureLum-Shehata 

Shehata  et  al.  [20] 

CD49fneg,  EpCAMpos 

Human 

Adult 

MatureLum-Prat 

Prat  et  al.  [21] 
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Fig.  1  Flowchart  of  analysis. 
Normal  mammary  tissue 
biopsies  were  taken  from  female 
patients  (a)  and  FACS-enriched 
into  distinct  mammary  cell 
subpopulations  (b). 
Transcriptome  profiling  was 
performed  on  each 
subpopulation  using  gene 
expression  microarrays  by  three 
different  studies  (c).  Within 
each  study,  genes  highly 
expressed  within  each 
subpopulation  were  determined 
using  a  two-class  SAM  (d). 
Genes  commonly  and 
specifically  enriched  within 
each  subpopulation  across 
studies  were  determined  to 
identify  ‘enriched’  gene 
signatures  (e).  Each  ‘enriched’ 
signature  was  refined  by 
supervised  hierarchical 
clustering  to  identify  gene 
‘features’  highly  correlated 
across  a  diverse  set  of  human 
breast  tumors  (f).  These  gene 
signatures  were  then  used  for 
clinical  testing  (g) 
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Fig.  2  Comparison  of  mammary  subpopulations  across  studies, 
a  Unsupervised  hierarchical  clustering  was  performed  with  the 
normal  human  mammary  subpopulation  dataset  using  any  gene  that 
had  a  log2  absolute  expression  value  greater  than  three  in  at  least  four 

determined  using  a  10-fold  cross  validation  for  each  nor¬ 
mal  mammary  subpopulation  sample  to  centroids  created 
using  either  the  HsEnriched- derived  gene  signatures  or  to 
centroids  created  using  the  gene  signatures  derived  sepa¬ 
rately  from  each  human  study  (Supplemental  Fig.  1).  The 
HsEnriched  centroids  had  a  significantly  reduced  Euclid¬ 
ean  distance  (~70  %)  to  each  mammary  subpopulation 
(t  test  p  <  0.0001),  indicating  greater  specificity  for  the 
consensus  HsEnriched  signatures  when  compared  with  any 
individual  dataset’s  subpopulation  signature. 

We  next  evaluated  the  utility  of  these  signatures  for 
distinguishing  human  tumor  subtypes.  Figure  3b  displays 
the  standardized  average  expression  of  each  HsEnriched 
signature  across  the  human  intrinsic  breast  tumor  subtypes 
[7,  9]  using  over  3,000  tumors  [9,  31,  32].  The  aStr- 
HsEnriched  signature  was  highest  in  claudin-low  and 
normal-like  tumors.  Interestingly,  claudin-low  tumors  also 
highly  express  the  aMaSC -HsEnriched  signature.  High 


samples,  b  Pearson  correlations  were  determined  between  the  average 
expressions  of  each  study’s  subpopulations  using  all  genes,  c  The  first 
three  principle  components  were  determined  across  the  human 
mammary  subpopulation  dataset 

expression  of  the  aMaSC-HsEnriched  signature  in  claudin- 
low  tumors  is  unlikely  an  artifact  of  stromal  cells  in  these 
tumors  since  the  Pearson  correlation  between  the  aStr- 
HsEnriched  and  aMaSC-HsEnriched  signatures  was  —0.19 
across  the  normal  human  mammary  samples.  The  LumProg 
and  MatureLum-HsEnriched  signatures  were  most  highly 
expressed  in  basal-like  and  luminal  subtype  tumors, 
respectively  (Fig.  3b). 

We  noted  a  considerable  degree  of  signature  variation 
within  a  subtype,  indicating  that  it  is  not  necessarily  the 
case  that  all  tumors  of  a  given  subtype  share  features  with 
the  same  normal  cell  type.  A  nearest  centroid  predictor 
with  a  10-fold  cross  validation  error  rate  of  4.8  %  was 
created  to  individually  determine  which  normal  mammary 
epithelial  subpopulation  is  most  similar  to  each  tumor. 
Samples  with  positive  silhouette  widths  [27]  were  consid¬ 
ered  to  have  a  strong  association  with  their  particular 
subpopulation,  with  all  other  tumors  being  categorized  as 
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Fig.  3  Homo  sapiens-Q nriched  gene  signatures,  a  HsEnriched  gene 
signatures  were  identified  for  each  mammary  subpopulation.  First,  the 
overlap  of  genes  highly  expressed  within  each  subpopulation  across 
studies  was  determined.  This  overlapping  gene  set  was  further  filtered 
to  remove  genes  also  identified  as  enriched  in  another  subpopulation 
to  limit  the  signature  to  genes  specific  to  an  individual  subpopulation. 
The  remaining  genes  comprised  the  HsEnriched  gene  signature  for 
that  subpopulation,  as  indicated  by  the  shaded  box.  b  The 


standardized  average  expression  of  the  four  HsEnriched  gene 
signatures  was  calculated  across  three  human  datasets  and  displayed 
by  intrinsic  tumor  subtype,  c  A  nearest  centroid  predictor  using  the 
HsEnriched  gene  signatures  was  used  to  determine  which  epithelial 
features  each  tumor  most  represented.  To  reduce  spurious  findings, 
any  tumor  with  a  negative  silhouette  width  was  considered  to  have  a 
weak  association  and  was  labeled  as  ‘unclassified’ 


‘unclassified’  [33]  (Fig.  3c).  Specifically,  94  %  of  basal- 
like  tumors  had  LumProg  expression  profiles.  The  claudin- 
low  subtype  had  the  highest  percentage  of  tumors  classified 
as  aMaSC  (18  %),  although  most  claudin-low  tumors  were 
classified  as  having  LumProg  features  (59  %).  The  HER2- 
enriched  subtype  was  predominantly  classified  as  having 
LumProg  expression  features.  The  luminal  A  and  B  sub- 
types  were  most  similar  to  the  MatureLum  subpopulation. 

Murine  mammary  cell  subpopulation  enriched  gene 
signatures 

Several  groups  have  also  profiled  normal  murine  mammary 
cell  subpopulation  expression  features  using  FACS  [22,  23] 
(Table  2).  In  addition  to  highlighting  conserved  expression 
features  across  species  [22],  murine  studies  are  uniquely 
positioned  to  enable  comparisons  with  developmental 
states  not  easily  accessed  in  humans,  including  early  fetal 
development  [23].  We  were  particularly  interested  in  fetal 
mammary  stem  cells  (fMaSC)  [23],  which  is  a  distinct  cell 
population  not  captured  in  any  human  study  performed 


thus  far  (Table  3).  Using  the  same  approach  that  we  used  to 
derive  the  HsEnriched  signatures,  we  created  Mus  muscu- 
/ws-enriched  (MmEnriched)  signatures  for  each  murine 
mammary  subpopulation  (Fig.  4a)  [22,  23]. 

We  calculated  the  standardized  average  expression  of 
each  MmEnriched  signature  across  the  murine  intrinsic 
subtypes/classes  (Fig.  4b)  [14].  As  in  human  tumors,  the 
Str-MmEnriched  signature  was  most  highly  expressed  in 
Normal-likeEx  and  Claudin-lowEx;  this  common  feature  was 
anticipated  given  the  high  similarity  of  these  two  classes  to 
their  human  subtype  counterparts  and  their  known  enrich¬ 
ment  for  stroma-associated  genes  [14,  23].  The  aMaSC- 
MmEnriched  signature  was  most  highly  expressed  in 
Classl4Ex  and  to  a  slightly  lesser  extent  in  Wntl-LateEx, 
Wntl-EarlyEx,  p53null-BasalEx,  and  Squamous-likeEx.  The 
fMaSC-MmEnriched  signature  was  most  highly  expressed 
in  WapINT3Ex,  which  is  consistent  with  the  finding  that  Int3 
( Notch4 )  inhibits  mammary  cell  differentiation  [34,  35]. 
The  LumProg-MmEnriched  signature  was  highest  in 
PyMTEx  and  NeuEx.  This  finding  was  unexpected  given 
that  these  two  mouse  classes  have  been  shown  to  resemble 
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Table  2  Murine  FACS-enriched  normal  mammary  cell  subpopulation  studies 


Enriched  population 

FACS  markers 

Species 

Source 

Abbreviation 

Reference 

Stroma 

Cd24neg/low/med 

Mouse 

Fetal 

fStr-Spike 

Spike  et  al.  [23] 

Cd29neg,  Cd24neg 

Mouse 

Adult 

aStr-LimlO 

Lim  et  al.  [22] 

Stem  cell 

Cd49fhi,  Cd24hi 

Mouse 

Fetal 

fMaSC-Spike 

Spike  et  al.  [23] 

Cd49fhi,  Cd24med 

Mouse 

Adult 

aMaSC-Spike 

Spike  et  al.  [23] 

Cd29pos,  Cd24pos,  Cd61pos 

Mouse 

Adult 

aMaSC-LimlO 

Lim  et  al.  [22] 

Luminal  progenitor 

Cd29neg,  Cd24pos,  Cd61pos 

Mouse 

Adult 

LumProg-Lim  1 0 

Lim  et  al.  [22] 

Mature  luminal 

Cd29neg,  Cd24pos,  Cd61neg 

Mouse 

Adult 

MatureLum-Lim  1 0 

Lim  et  al.  [22] 

Table  3  Gene  set  analysis  of  human  and  murine  cell  subpopulations 
Murine  subpopulation  Human  subpopulation 


Str 

aMaSC 

LumProg 

MatureLum 

Str 

0.044 

- 

- 

- 

fMaSC 

- 

- 

0.4395 

0.4395 

aMaSC 

- 

0.044 

- 

- 

LumProg 

- 

- 

0.042 

0.386 

MatureLum 

- 

0.464 

0.306 

0.004 

A  comparative  analysis  of  each  human  subpopulation  versus  each 
murine  subpopulation  was  performed  using  GSA.  The  FDR  is  dis¬ 
played  for  all  comparisons  with  a  positive  association.  Statistically 
significant  associations  (FDR  <  0.05)  are  bolded 

luminal  human  tumors  [13,  14].  Lastly,  the  MatureLum- 
MmEnriched  signature  was  most  highly  expressed  in 
StatlEx  and  Class  14Ex.  Both  the  Statl~'~  and  Pik3ca- 
H1047R  mouse  models,  which  define  these  two  classes 
respectively,  are  often  ER  positive  [36,  37],  and  these  data 
suggest  that  they  have  MatureLum  features.  Class  14Ex  also 
exhibited  significant  expression  of  the  aMaSC-MmEn- 
riched  signature,  indicating  that  these  tumors  contain  a 
mixture  or  share  features  of  multiple  cell  types. 

Consistent  with  Fig.  4b,  91  %  of  WapINT3Ex  tumors  were 
classified  as  having  fMaSC  features  in  a  nearest  centroid  pre¬ 
dictor  analysis.  Mouse  luminal  classes  of  breast  carcinoma 
(Erbb2-likeEx,  MycEx,  PyMTEx,  and  NeuEx)  were  most  similar 
to  LumProg  cells,  which  again  were  unexpected  but  consistent 
with  previous  findings  [22,  38].  Wntl-EarlyEx,  p53null- 
BasalEx,  and  Squamous-likeEx  tumors  had  primarily  aMaSC 
features.  Interestingly,  Claudin-lowEx  and  to  a  lesser  extent  C3- 
TagEx  tumors  also  had  aMaSC  features.  All  StatlEx  tumors  had 
MatureLum  features,  consistent  with  being  ER  positive  [36]. 

LumProg  and  fMaSC  features  predict  neoadjuvant 
chemotherapy  response 

Breast  tumors  respond  heterogeneously  to  neoadjuvant 
chemotherapy  treatment  [15].  We  hypothesized  that  cellular 


features  of  normal  mammary  subpopulations  may  identify 
tumors  most  likely  to  respond  to  neoadjuvant  chemotherapy. 
To  test  this,  we  compiled  a  dataset  of  702  neoadjuvant 
anthracycline  and  taxane  chemotherapy-treated  patients 
(Supplemental  Table  2). 

Although  genes  within  each  ‘enriched  signature’  are 
highly  correlated  within  their  respective  normal  cell 
subpopulation,  it  does  not  necessarily  follow  that  all  genes 
within  a  given  normal  cell  signature  would  be  as  coor¬ 
dinate^  regulated  in  tumors.  Therefore,  we  subdivided 
each  signature  into  smaller  features  (feature  1,  feature2, 
etc.)  that  are  coordinately  expressed  in  tumors,  reasoning 
that  such  refined  ‘features’  may  be  more  clinically  robust. 
All  ‘enriched’  and  refined  ‘features’  were  tested  for  their 
ability  to  predict  pCR  to  neoadjuvant  chemotherapy  in  a 
UVA  (Supplemental  Table  3).  UVA  significant  signatures 
(p  <  0.05)  were  then  considered  in  a  MVA  with  age,  ER 
status,  PR  status,  HER2  status,  tumor  stage,  PAM50 
subtype  [39],  and  PAM50  proliferation  score  [39]  to 
determine  if  any  mammary  subpopulation  ‘features’ 
added  novel  information  for  predicting  pCR  (Supple¬ 
mental  Table  4). 

Six  normal  mammary  gene  signatures  were  UVA  and 
MVA  significant  (Supplemental  Tables  3  and  4),  with  the 
95  %  UVA  odds  ratio  of  these  six  signatures  and  all  other 
‘enriched  signatures’  displayed  in  Fig.  5a.  Interestingly, 
the  LumProg-HsEnriched  and  LumProg-HsEnriched-fea- 
turel  signatures,  both  of  which  were  highly  correlated 
(Fig.  5b),  were  significant  in  the  UVA  and  MVA  analyses, 
indicating  that  tumors  with  LumProg  features  are  more 
likely  to  respond  to  neoadjuvant  treatment.  Importantly, 
this  response  was  independent  of  proliferation,  as  high¬ 
lighted  by  their  low  correlation  to  the  PAM5  0-Proliferation 
gene  signature  (Fig.  5b). 

Interestingly,  the  fMaSC-MmEnriched  signature  refined 
into  two  distinctly  opposite,  highly  significant  signatures  in 
both  the  UVA  and  MVA  (Supplemental  Table  3,  4; 
Fig.  5b,  c).  While  the  fMaSC-MmEnriched  signature  was 
highest  in  basal-like  tumors,  the  refined  signatures  varied, 
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◄  Fig.  4  Mus  musculus-Q nriched  gene  signatures,  a  MmEnriched  gene 
signatures  were  identified  for  each  mammary  subpopulation.  First,  the 
overlap  of  genes  highly  expressed  within  each  subpopulation  across 
studies  was  determined.  This  overlapping  gene  set  was  further  filtered 
to  remove  genes  also  identified  as  enriched  in  another  subpopulation 
to  limit  the  signature  to  genes  specific  to  an  individual  subpopulation. 
The  remaining  genes  comprised  the  MmEnriched  gene  signature  for 
that  subpopulation,  as  indicated  by  the  shaded  box.  b  The  standard¬ 
ized  average  expression  of  the  five  MmEnriched  gene  signatures  was 
calculated  across  a  murine  dataset  and  displayed  by  intrinsic  tumor 
class,  c  A  nearest  centroid  predictor  using  the  MmEnriched  gene 
signatures  was  used  to  determine  which  epithelial  features  each  tumor 
most  represented.  To  reduce  spurious  findings,  any  tumor  with  a 
negative  silhouette  width  was  considered  to  have  a  weak  association 
and  was  labeled  as  ‘unclassified’ 


with  fMaSC -MmEnriched- feature  1  (Fig.  5d)  being  highest 
in  basal-like  tumors  and  fMaSC-MmEnriched-feature2 
(Fig.  5e)  expressed  in  luminal  tumors.  Tumors  with 
fMaSC-MmEnriched-featurel  expression  were  more  likely 
to  respond  to  neoadjuvant  chemotherapy,  while  those 
tumors  with  fMaSC-MmEnriched-feature2  were  more 
resistant.  The  fMaSC-MmEnriched-featurel  signature  was 
very  highly  correlated  with  the  LumProg-HsEnriched  sig¬ 
natures  (Fig.  5b),  sharing  four  genes  in  common  (Fig.  5d). 
These  results  support  the  hypothesis  that  subsets  of  genes 
within  the  larger  ‘enriched  signature’  are  likely  regulated 
by  different  biological  mechanisms. 
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Fig.  5  fMaSC-enriched  gene  signatures,  a  The  univariate  logistic 
regression  odds  ratio  predicting  pathologic  complete  response  to 
neoadjuvant  anthracycline  and  taxane  chemotherapy  was  determined 
using  a  702  patient  dataset,  with  the  95  %  confidence  interval  shown 
as  a  forest  plot.  A  single  indicates  that  the  signature  was  univariate 
significant,  while  *****  indicates  that  the  signature  was  both 
univariate  and  multivariate  significant  (p  <  0.05).  b  Pearson 


correlations  of  multivariate  significant  gene  signatures  and  prolifer¬ 
ation  were  determined,  c  The  standardized  average  expression  of  the 
fMaSC-MmEnriched  signature  and  its  two  refined  signatures  were 
calculated  across  three  human  datasets  and  displayed  by  intrinsic 
tumor  subtype,  d  Genes  in  the  fMaSC-MmEnriched-refinedl  signa¬ 
ture.  e  Genes  in  the  fMaSC-MmEnriched-refined2  signature 
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Discussion 

Normal  mammary  gland  physiology  is  supported  by  an 
underlying,  complex  cell  hierarchy  [2-5].  The  simplest 
model  treats  differentiation  from  mammary  stem  cells  to 
progenitor  cells  to  mature  cells  as  unidirectional,  but  recent 
observations  indicate  that  bidirectional  processes  are  also 
possible  for  normal  and  neoplastic  cells  [11].  This  differ¬ 
entiation  plasticity  may  allow  tumors  to  acquire  cell  fea¬ 
tures  foreign  to  the  initial  cell-of-origin  or  to  lose  native 
features  through  the  accumulation  of  specific  genetic 
aberrations  [40]. 

Regardless  of  how  different  cellular  traits  are  acquired, 
it  is  critical  to  identify  the  ‘current’  normal  cellular  features 
within  a  tumor,  and  therefore,  we  first  analyzed  the 
expression  profiles  of  normal  human  and  mouse  mammary 
epithelial  cell  subpopulations  [19-23].  We  chose  to  use 
nomenclature  that  maintains  continuity  with  the  literature. 
However,  these  terms  should  be  considered  provisional  as 
the  complete  biological  profiles  of  these  FACS  fractions 
are  investigated  [4].  Recent  work  by  Prater  et  al.  [41]  found 
that  mouse  ‘LumProg’  cells  (CD49f^,  EpCAM+)  have 
complete  mammary  gland  repopulating  potential,  indicat¬ 
ing  that  ‘LumProg’  may  be  a  misnomer.  Importantly,  even 
if  our  understanding  and  naming  of  these  cell  subpopula¬ 
tions  change,  only  the  retrospective  interpretation  of  the 
data  presented  here  will  be  affected,  not  the  data  itself. 

Using  a  meta-analysis  approach,  FACS-purified  mam¬ 
mary  epithelial  cell  subpopulation  ‘enriched’  gene  signa¬ 
tures  were  derived  and  a  nearest  centroid  predictor  was 
developed  to  identify  which  normal  mammary  subpopula¬ 
tion  each  human  and  mouse  tumor  most  represented  using 
over  three  thousand  human  patients  and  27  mouse  models 
of  mammary  carcinoma  [14].  While  these  analyses  imply  a 
cell-of-origin  for  a  given  tumor,  additional  experiments 
(e.g.,  lineage  tracing)  will  be  required  to  unequivocally 
determine  this.  Nevertheless,  these  associations  at  the  very 
least  identify  which  normal  mammary  subpopulation  a 
given  tumor  most  represents  in  its  current  state. 

With  this  in  mind,  several  associations  between  both  the 
human  and  mouse  intrinsic  subtypes  and  specific  normal 
cell  subpopulations  were  observed.  First,  human  basal-like 
tumors  have  been  referred  to  as  ‘undifferentiated’ ,  which  is 
consistent  with  their  exhibiting  LumProg  [19]  and  fetal 
MaSC  features  [23].  Three  mouse  classes  have  been 
identified  to  be  human  basal-like  counterparts:  MycEx, 
p53null-BasalEx,  and  C3-TagEx  [14].  MycEx  tumors  were 
the  most  similar  to  the  LumProg  cell  profile.  By  contrast, 
both  p53null-BasalEx  and  C3-TagEx  tumors  had  adult 
MaSC  features.  These  results  indicate  that  MycEx  tumors 
share  similar  cell  features  as  their  human  basal-like  coun¬ 
terpart,  making  it  an  attractive  mouse  model  for  studying 
basal-like  tumors  with  aberrant  Myc  signaling  [10,  42]. 


Interestingly,  neither  p53null-BasalEx  nor  C3-TagEx  tumors 
had  strong  LumProgs  features,  indicating  that  their  asso¬ 
ciation  with  human  basal-like  tumors  is  more  likely  driven 
by  their  underlying  genetics  [10]. 

Human  claudin-low  tumors  had  heterogeneous  normal 
cell  features.  While  most  were  similar  to  LumProg  cells, 
the  claudin-low  subtype  also  had  the  largest  percentage  of 
tumors  classified  as  adult  MaSC.  Given  that  claudin-low 
tumors  are  enriched  with  epithelial-to-mesenchymal  tran¬ 
sition  features  [9,  43,  44],  our  results  suggest  that  these 
tumors  may  originate  from  the  LumProg  population  prior 
to  acquiring  adult  MaSC  and/or  mesenchymal  features. 
Similarly,  mouse  Claudin-lowEx  tumors  were  also  strongly 
associated  with  the  adult  MaSC  population,  indicating  that 
such  tumors  may  be  the  closest  analogs  of  the  subset  of 
human  claudin-low  tumors  with  adult  MaSC  features. 

Human  HER2-enriched  tumors  were  the  most  similar  to 
the  LumProg  subpopulation.  This  is  a  novel  finding  and 
may  explain  why  both  human  basal-like  and  HER2-enri- 
ched  subtype  tumors  show  high  TP53  mutation  frequencies 
(>70  %)  and  widespread  chromosomal  instability  [10]. 
These  data  could  suggest  that  the  normal  LumProg  cell  is 
somehow  extremely  dependent  on  TP53  function.  The 
murine  Erbb2-likeEx  class  has  been  identified  as  a  mouse 
counterpart  for  human  HER2-enriched  tumors  [14]  and  was 
shown  here  to  also  have  LumProg  features. 

When  analyzing  the  human  luminal  A  and  B  subtypes,  a 
clear  association  with  normal  MatureLum  cells  was 
observed.  The  murine  NeuEx  class  is  a  proposed  counter¬ 
part  for  human  luminal  A  tumors  [14],  yet  these  mouse 
tumors  were  most  similar  to  normal  mouse  LumProg  cells. 
The  MycEx  class  was  also  identified  to  resemble  human 
luminal  B  tumors  [14].  As  discussed,  MycEx  tumors  have 
LumProg  features;  therefore,  most  mouse  luminal  A/B 
tumor  models  do  not  share  the  same  normal  cell  features  as 
their  human  tumor  counterparts.  These  differences  may 
reflect  limitations  of  model  system  design,  as  tumors  within 
these  mouse  classes  are  primarily  driven  by  either  the  WAP 
or  MMTV  promoter.  These  differences  in  cell  features, 
however,  indicate  that  the  trans-species  associations 
observed  previously  [14]  are  possibly  driven  by  the 
genetics  of  each  mouse  model.  Nevertheless,  broad 
molecular  features  are  conserved  between  these  human- 
murine  counterparts  [14].  Therefore,  we  propose  that  these 
mouse  models  retain  significant  preclinical  utility  provided 
that  shared  versus  distinct  molecular  features  are  taken  into 
account. 

Neoadjuvant  chemotherapy  is  a  common  approach  for 
treating  breast  tumors,  but  only  a  relatively  low  percentage 
of  patients  have  a  pCR  (~20  %  overall).  We  tested  the 
clinical  significance  of  normal  cellular  features  for  pre¬ 
dicting  pCR  using  a  combination  of  UVA  and  MV  A 
logistic  regression  analyses.  Human  LumProg  and  mouse 
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fetal  MaSC  expression  features  were  identified  as  predic¬ 
tive  of  pCR  sensitivity  across  all  breast  cancer  patients. 
More  specifically,  LumProg-HsEnriched-featurel  and 
fMaSC-MmEnriched-featurel  were  highly  expressed  in 
basal-like  tumors.  This  is  consistent  with  the  clinical 
observation  that  basal-like  tumors  have  better  neoadjuvant 
chemotherapy  response  rates  since  higher  expression  of 
these  normal  cell  signatures  was  associated  with  a  higher 
likelihood  of  pCR.  Distinct  from  these  signatures,  tumors 
with  high  expression  of  fMaSC-MmEnriched-feature2 
were  more  resistant  to  neoadjuvant  chemotherapy.  Not 
surprisingly,  this  signature  was  most  highly  expressed  in 
luminal  A  and  B  tumors,  consistent  with  the  clinical 
observation  that  these  subtypes  have  lower  chemotherapy 
response  rates.  Importantly,  these  signatures  remained 
significant  even  after  controlling  for  intrinsic  subtype, 
proliferation,  and  clinical  variables  in  the  MVA  analysis; 
thus  these  normal  cell  signatures  add  information  even 
when  tumor  subtype  and  clinical  features  are  known.  It  is 
presently  unknown  whether  tumors  with  these  features 
arise  from  a  LumProg  or  fetal  MaSC  cell-of-origin  or 
acquire  these  features  during  tumorigenesis.  Whether  these 
features  are  acquired  or  inherent,  the  ‘current’  cellular 
traits  of  a  tumor  are  likely  most  important  as  these  appear 
to  be  a  major  determinant  of  chemotherapy  sensitivity.  The 
biological  explanation  for  why  LumProg  and  fetal  MaSC 
expression  features  predict  tumor  responsiveness  to  neo¬ 
adjuvant  chemotherapy  will  need  to  be  explored  further, 
but  it  is  likely  linked  to  the  common  genetic  features  of 
TP53  loss  [45],  RB-pathway  loss  [46],  and  high  prolifera¬ 
tion  status  [47],  as  well  as  other  inherent  characteristics  of 
these  cellular  states.  This  work  highlights  the  efficacy  of 
studying  the  normal  mammary  gland  cell  hierarchy  and 
development  to  provide  insights  into  human  tumor  therapy 
responsiveness. 
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SUMMARY 

To  discover  mechanisms  that  mediate  plasticity  in 
mammary  cells,  we  characterized  signaling  networks 
that  are  present  in  the  mammary  stem  cells  respon¬ 
sible  for  fetal  and  adult  mammary  development. 
These  analyses  identified  a  signaling  axis  between 
FGF  signaling  and  the  transcription  factor  SoxlO. 
Here,  we  show  that  SoxlO  is  specifically  expressed 
in  mammary  cells  exhibiting  the  highest  levels  of 
stem/progenitor  activity.  This  includes  fetal  and  adult 
mammary  cells  in  vivo  and  mammary  organoids 
in  vitro.  SoxlO  is  functionally  relevant,  as  its  deletion 
reduces  stem/progenitor  competence  whereas  its 
overexpression  increases  stem/progenitor  activity. 
Intriguingly,  we  also  show  that  Soxl  0  overexpression 
causes  mammary  cells  to  undergo  a  mesenchymal 
transition.  Consistent  with  these  findings,  SoxlO 
is  preferentially  expressed  in  stem-  and  mesen- 
chymal-like  breast  cancers.  These  results  demon¬ 
strate  a  signaling  mechanism  through  which  stem 
and  mesenchymal  states  are  acquired  in  mammary 
cells  and  suggest  therapeutic  avenues  in  breast  can¬ 
cers  for  which  targeted  therapies  are  currently  un¬ 
available. 


INTRODUCTION 

The  capacity  to  reprogram  differentiated  cells  in  vivo  and  ex  vivo 
indicates  that  the  differentiated  state  is  not  as  fixed  as  once 
thought  (Takahashi  and  Yamanaka,  2006;  Tata  et  al.,  2013). 

This  plasticity  has  important  implications  for  cancer,  where  the 
dysregulation  of  stem  and  mesenchymal  states  appears  to  be 
critical  in  disease  initiation  and  progression.  Phenotypic  lability 
may  endow  some  types  of  cancer  cells,  often  termed  “cancer 
stem  cells”  (CSC),  with  a  greater  capacity  to  propagate  the  dis¬ 
ease  when  assayed  in  a  transplant  setting  (Al-Hajj  et  al.,  2003; 
Bonnet  and  Dick,  1997).  In  contrast  to  CSCs,  which  typically 


exhibit  mesenchymal  characteristics,  transcriptome  analyses 
have  revealed  another  class  of  tumorigenic  cancer  cells  whose 
gene  expression  profiles  resemble  those  of  cells  with  known 
stem  or  progenitor  cell  functions.  Tumors  with  these  distinct 
“stem-like”  cancer  cells  tend  to  appear  less  differentiated  and 
behave  more  aggressively,  while  eliminating  such  cells  can 
attenuate  tumor  progression  (Chen  et  al.,  2012;  Eppert  et  al., 
201 1 ;  Merlos-Suarez  et  al.,  201 1 ;  Schepers  et  al.,  2012).  Stem¬ 
like  cancer  cells  may  arise  either  by  cell  of  origin,  in  which  the 
tumor  originates  in  a  stem/progenitor  cell  and  retains  those 
properties  through  tumorigenesis,  or  through  reprogramming 
of  differentiated  cells  into  a  stem-like  state  (Barker  et  al.,  2009; 
Schwitalla  et  al.,  2013).  Because  a  significant  fraction  of  triple¬ 
negative  breast  cancers  contain  stem-like  cancer  cells,  we 
have  focused  on  elucidating  the  molecular  mechanisms  that 
specify  the  mammary  stem  cell  (MaSC)  state,  assuming  that 
such  knowledge  will  deepen  our  understanding  of  how  such 
breast  cancers  initiate  and  progress. 

The  mammary  gland  contains  at  least  two  populations  of  cells 
with  stem  or  progenitor  qualities  (Shackleton  et  al.,  2006;  Stingl 
et  al.,  2006).  Luminal  progenitors  comprise  a  heterogeneous 
population  of  cells  in  the  luminal  fraction  of  the  gland  that 
possess  clonogenic  properties  in  vitro  (Shehata  et  al.,  2012). 
This  population  may  contain  the  cell  of  origin  for  stem-like 
basal-like  breast  cancers  (Lim  et  al.,  2009).  Transplantation 
studies  also  demonstrate  that  the  basal  fraction  of  the  gland  con¬ 
tains  cells  capable  of  generating  an  entire  mammary  gland. 
These  MaSCs  are  inferred  to  possess  extensive  proliferative, 
invasive,  and  multi-lineage  differentiation  potential,  as  a  single 
MaSC  can  regenerate  a  functional  gland  (Shackleton  et  al., 
2006). 

Several  fundamental  aspects  of  MaSC  biology  remain  to  be 
elucidated.  There  is  no  consensus  on  the  number  of  MaSCs 
within  the  gland,  which  has  hindered  analyses  of  the  origin  of 
breast  tumors  (Tomasetti  and  Vogelstein,  2015).  There  are  also 
conflicting  data  about  the  topographical  location  of  MaSCs  in 
the  gland  and  the  developmental  timeframe  during  which  these 
cells  retain  multi-lineage  potential  (Rios  et  al.,  2014;  Van  Key- 
meulen  et  al.,  2011).  Both  of  these  problems  might  be  resolved 
by  availability  of  markers  enabling  prospective  MaSC  identifica¬ 
tion.  The  mechanisms  by  which  mammary  cells  enter  and  exit 
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from  the  MaSC  state  also  remain  to  be  defined,  and  resolving  this 
problem  may  present  solutions  to  those  concerning  MaSC  iden¬ 
tification.  One  recent  advance  on  this  topic  involves  the  demon¬ 
stration  that  Sox9  and  Slug  act  together  to  convert  mammary 
epithelial  cells  into  cells  with  MaSC-like  properties  (Guo  et  al., 
2012).  However,  the  degree  to  which  this  mechanism  is  utilized 
in  the  gland  is  not  clear  because  the  distribution  and  function 
of  Sox9  or  Sox9/Slug  cells  in  unperturbed  in  vivo  contexts  remain 
to  be  defined.  Moreover,  mice  that  are  deficient  for  Slug  do  form 
a  complete  native  mammary  gland,  which  suggests  that  Slug  is 
not  an  essential  determinant  of  the  MaSC  state  (Nassour  et  al., 
2012).  Clearly,  a  better  understanding  of  the  transcriptional  pro¬ 
grams  and  extrinsic  signaling  mechanisms  that  regulate  the 
MaSC  state  is  required. 

To  investigate  the  biology  of  MaSCs  and  MaSC-like  cells  in 
cancer,  our  research  has  focused  on  the  stem  cells  present  dur¬ 
ing  fetal  mammary  development.  During  mid-late  embryogen- 
esis,  mammary  cells  are  highly  proliferative  and  invasive  and 
likely  experience  conditions  such  as  hypoxia  and  growth-ori¬ 
ented  metabolism  that  resemble  those  encountered  by  tumor 
cells  (Masson  and  Ratcliffe,  2014).  Fetal  MaSCs  (fMaSCs)  may 
therefore  most  resemble  the  MaSC-like  cancer  cells  in  breast  tu¬ 
mors.  Indeed,  we  previously  showed  that  fMaSCs  exhibit  both 
the  organoid-forming  and  mammary-repopulating  properties 
found  in  luminal  progenitors  and  adult  MaSCs,  respectively 
(Spike  et  al.,  2012).  Transcriptome  profiling  of  fMaSCs  and  adult 
MaSCs  revealed  that  the  fMaSC  signature  gene  list  is  uniquely 
enriched  in  basal-like  breast  tumors,  indicating  the  presence  of 
fMaSC-like  cells  in  such  tumors.  This  shared  biology  suggests 
that  fetal  mammary  development  and  fMaSCs  can  be  utilized 
to  identify  molecular  mechanisms  that  govern  important  func¬ 
tions  in  breast  cancer. 

Here,  we  describe  how  analysis  of  fMaSCs  revealed  an  impor¬ 
tant  function  for  SoxlO  in  mammary  cells.  Sox  family  transcrip¬ 
tion  factors  have  well-defined  roles  in  regulating  cell-fate 
decisions  in  different  tissues  and  at  different  stages  of  develop¬ 
ment  (Sarkar  and  Hochedlinger,  2013).  Sox  factors  generally 
induce  preferential  differentiation  down  one  cell  lineage  path 
over  another,  often  by  antagonizing  the  activity  of  other  line- 
age-specifying  factors.  This  phenomenon  has  best  been 
described  with  Sox2  and  the  elucidation  of  roles  for  Sox2  in  mul¬ 
tiple  different  cell-fate  decisions,  each  of  which  occurs  in  concert 
with  other  transcription  factors  (Sarkar  and  Hochedlinger,  2013). 
However,  when  Sox  expression  or  activity  is  balanced  or  kept  at 
lower  levels  in  the  cell  by  other  key  factors,  differentiation  is  fore¬ 
stalled  and  stem  and  progenitor  functions  arise  (Kopp  et  al., 
2008).  This  is  consistent  with  an  emerging  model  of  stem  cell 


specification  through  the  balance  of  lineage  specifiers  (Loh 
and  Lim,  2011).  Sox  factors  can  thus  be  mediators  and  markers 
of  both  differentiation  and  sternness,  depending  on  expression 
level  and  cellular  context. 

Here,  we  report  that  SoxlO  plays  important  regulatory  roles  in 
promoting  both  stem-  and  epithelial  to  mesenchymal  transition 
(EMT)-like  properties  in  mammary  stem  cells.  Critically,  these 
stem  and  mesenchymal  states  are  acquired  independently  of 
one  another;  this  clear  distinction  prevents  potential  conflation 
of  stem  cell  and  mesenchymal  properties,  and  demonstrates 
how  these  distinct  states  can  be  related  by  a  single  factor  such 
as  SoxlO.  We  further  present  evidence  that  these  functions 
may  be  conserved  in  certain  types  of  aggressive  breast  cancers, 
and  demonstrate  the  importance  of  FGF10  in  a  paracrine 
signaling  mechanism  that  regulates  SoxlO. 

RESULTS 

SoxlO  Is  an  fMaSC-  and  Tumor-Associated 
Transcription  Factor  Regulated  by  Fibroblast  Growth 
Factor  Signaling 

To  identify  molecular  mechanisms  that  specify  stem/progenitor 
cell  functions  in  mammary  cells,  we  analyzed  transcriptome 
profiles  of  fMaSCs  and  their  surrounding  fetal  stroma  (fStr) 
(Spike  et  al.,  2012).  We  prioritized  both  transcription  factors 
that  are  differentially  expressed  in  the  fMaSC-enriched  popula¬ 
tion  and  inferred  signaling  axes  between  fMaSCs  and  fStr 
that  could  regulate  their  expression.  These  analyses  identified 
SoxlO  as  one  of  the  most  prominent  transcription  factors  asso¬ 
ciated  with  the  fMaSC  population  (Figure  1  A).  This  was  of  imme¬ 
diate  interest,  as  Sox  family  transcription  factors  play  important 
roles  in  pluripotent  or  tissue-specific  stem  cell  states  (Sarkar 
and  Hochedlinger,  2013).  Further,  SoxlO  in  particular  has 
been  shown  to  be  a  critical  transcription  factor  in  reprogram¬ 
ming  differentiated  cells  into  multipotent  stem/progenitor  states 
(Hornig  et  al.,  2013;  Kim  et  al.,  2014;  Naim  et  al.,  2013;  Yang 
et  al.,  2013). 

These  analyses  also  revealed  high  relative  expression  of 
FGF7  and  FGF10  in  the  fStr  and  expression  of  multiple  fibro¬ 
blast  growth  factor  receptor  (FGFR)  family  members  in  the 
fMaSC  population  (Figure  1A).  Fibroblast  growth  factor  (FGF) 
signaling  plays  a  critical  role  in  fetal  mammary  development, 
and  we  previously  showed  that  fMaSCs  could  utilize  FGF 
signaling  to  promote  multipotent  growth  in  vitro  (Lu  et  al., 
2008;  Mailleux  et  al.,  2002;  Spike  et  al.,  2012).  Furthermore, 
FGF  signaling  has  been  shown  to  regulate  the  expression 
and  function  of  different  Sox  family  transcription  factors  in 


Figure  1.  SoxlO  Is  an  fMaSC-  and  Tumor-Associated  Transcription  Factor  Regulated  by  FGF  Signaling 

(A)  Log2  microarray  expression  values  for  SoxlO  and  FGF  signaling  molecules  in  El 8  fMaSCs  and  fStroma. 

(B)  El  8  fMaSCs  grown  in  3D  culture  conditions  for  5-7  days  with  the  indicated  media.  Scale  bar,  150  jim. 

(C)  SoxlO  mRNA  expression  levels  in  fMaSC-derived  organoids  grown  with  FGFRi  for  7  days.  Y  axis  represents  SoxlO  mRNA  levels  normalized  to  the  vehicle. 

(D)  FACS-based  quantification  of  Venus+  cells  in  7-day-old  FGFRi-treated  organoids  grown  from  Sox  1 0- FI 2 B Venus  fMaSCs  or  adult  mammary  luminal  pro¬ 
genitors.  Y  axis  represents  the  #  of  Venus+  cells  as  a  %  of  the  total  #  of  cells  in  the  primary  organoids,  normalized  to  the  vehicle. 

(E)  FACS-based  quantification  of  Venus+  cells  in  8-day-old  organoids  grown  from  El  8  Sox70-FI2BVenus  fMaSCs  in  defined  growth  factors,  x  axis  is  Venus 
fluorescence,  and  the  number  in  the  box  is  %  gated  Sox10+  cells. 

(F)  Whisker  plots  for  Soxl  0  expression  from  the  Metabric  and  UNC885  breast  tumor  databases  across  multiple  subtypes.  Each  dot  is  a  Soxl  0  expression  value 
from  a  particular  tumor. 

Error  bars  represent  SD. 
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Figure  2.  SoxlO  Is  a  Fetal  Mammary  Stem  Cell  Marker  that  Improves  fMaSC  Purification 

(A)  Whole-mount  view  of  the  one  to  three  mammary  rudiment  pairs  in  an  El  8  Sox70-H2BVenus  embryo. 

(B  and  C)  Venus  fluorescence  in  El  6  and  El  8  SoxlO- H2BVenus  mammary  rudiments  whole  mounts. 

(D)  Whole-mount  mammary  rudiment  from  El  8  SoxlO- H2BVenus  embryo  immunostained  with  luminal  (K8)  and  basal  (K14)  markers. 

(E)  FACS  of  El  8  SoxlO- H2BVenus  fetal  mammary  cells  (pre-gated  for  EpCAM+  cells). 

(F)  Keratin  immunostain  of  single  El  8  Soxl oflox'GFP  EpCAM-i-  fetal  mammary  cells. 

(legend  continued  on  next  page) 
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multiple  developing  tissues  through  a  feedback  loop  of  un¬ 
known  mechanism  (Chen  et  al.,  2014;  Seymour  et  al.,  2012). 

These  observations  led  us  to  hypothesize  that  an  FGF  signaling 
axis  may  regulate  SoxlO  expression  in  mammary  stem/progen¬ 
itor  cells. 

To  address  this,  we  grew  fMaSCs  in  3D  culture  conditions  in 
the  presence  of  the  pan-FGFR  inhibitor,  JNJ-42756493  (FGFRi). 
With  vehicle  only,  fMaSCs  form  organoids  when  either  epidermal 
growth  factor  (EGF)  or  basic  FGF  (FGF2)  is  added  to  the  media 
but  fail  to  form  organoids  if  neither  growth  factor  is  present  (Fig¬ 
ure  1 B;  Figure  SI).  The  addition  of  FGFRi  blocks  organoid  forma¬ 
tion  if  FGF  is  the  only  available  growth  factor.  However,  organoid 
formation  is  rescued  upon  adding  EGF  to  media  containing 
FGFRi  (Figure  1 B).  As  the  number  of  dead  cells  does  not  increase 
in  FGFRi-treated  organoids  (data  not  shown),  these  data  demon¬ 
strate  that  fMaSC-derived  organoids  can  utilize  FGF  signaling 
and  indicate  that  FGFRi  blocks  FGF  signaling  without  eliciting 
overt  cytotoxicity. 

To  determine  if  FGF  signaling  regulates  SoxlO  expression 
in  mammary  cells,  we  measured  SoxlO  expression  levels  in 
fMaSC-derived  organoids  plated  with  vehicle  or  increasing  con¬ 
centrations  of  FGFRi.  Organoid  exposure  to  FGFRi  resulted  in 
significant  dose-dependent  decreases  in  SoxlO  mRNA  expres¬ 
sion  levels  (Figure  1C).  Similarly,  by  using  a  Sox70-H2BVenus 
bacterial  artificial  chromosome  (BAC)  transgenic  mouse  line  (in 
which  H2B-Venus  is  expressed  under  Soxl  0  transcriptional  reg¬ 
ulatory  elements)  to  quantify  the  Sox10+  cells  through  Venus 
fluorescence,  we  found  that  FGFRi  exposure  significantly 
reduced  the  number  of  Sox10+  mammary  organoid  cells  (Fig¬ 
ure  ID).  This  effect  was  observed  in  a  serum-based  medium  or 
in  a  serum-free  medium  (SFM)  containing  defined  growth  factors 
(Figure  1 D;  Figure  SI).  Organoids  that  were  generated  from  adult 
luminal  progenitors  also  showed  a  reduction  in  Sox10+  cells 
following  FGFRi  exposure  (Figure  ID).  fMaSCs  grown  in  the 
presence  of  SFM  with  EGF  +  FGF10  developed  into  organoids 
with  increased  numbers  of  Sox10+  cells  compared  to  fMaSCs 
grown  only  in  SFM  with  EGF  (Figure  IE).  This  effect  was  not 
seen  in  fMaSCs  grown  with  SFM  containing  EGF  +  FGF2,  indi¬ 
cating  a  specific  role  for  FGF1 0  signaling  through  its  cognate  re¬ 
ceptor,  FGFR2b.  No  significant  differences  in  SoxlO  levels  were 
observed  in  fMaSCs  grown  ±  EGF  (Figure  S2).  These  data  indi¬ 
cate  that  FGF  signaling  specifically  regulates  SoxlO  expression 
levels  in  mammary  cells. 

To  determine  whether  elevated  SoxlO  expression  was  a 
feature  common  to  fMaSCs  and  their  associated  human  cancer 
counterparts,  we  next  analyzed  the  expression  of  SoxlO  across 
a  panel  of  tumor  samples  representing  two  distinct  breast 
cancer  datasets.  This  analysis  revealed  that  basal-like  and 
claudin-low  breast  cancers  tend  to  express  significantly  higher 
levels  of  SoxlO  than  the  other  subtypes  of  the  disease  (Fig¬ 


ure  IF),  in  accordance  with  two  recent  studies  of  SoxlO  in 
breast  cancer  (Cimino-Mathews  et  al.,  2013;  Ivanov  et  al., 

2013).  These  two  subtypes  comprise  the  bulk  of  triple-negative 
breast  cancers,  and  both  are  frequently  metastatic  and  aggres¬ 
sive.  However,  they  differ  in  that  basal-like  breast  cancers  are 
weakly  differentiated  and  the  most  fMaSC-like  of  the  breast 
cancer  subtypes,  while  claudin-low  breast  cancers  possess 
the  most  EMT-like  morphology  and  transcriptome  among  the 
breast  cancer  subtypes  (Prat  et  al.,  2010;  Spike  et  al.,  2012). 
These  findings  suggest  that  SoxlO  expression  may  correlate 
with  distinct  stem  and  mesenchymal  properties  in  human 
breast  cancers. 

Collectively,  these  data  identify  SoxlO  as  an  FGF-responsive, 
mammary  stem  cell-associated  transcription  factor  with  likely 
roles  in  normal  and  transformed  mammary  cells. 

SoxlO  Is  a  Fetal  Mammary  Stem  Cell  Marker  that 
Improves  fMaSC  Purification 

To  elucidate  the  role  of  SoxlO  in  mammary  cells,  the  SoxIO- 
H2BVenus  BAC  transgenic  mouse  line  was  used  to  visualize 
Sox10+  cells.  Consistent  with  the  fMaSC  transcriptome  data, 
Soxl  0  was  robustly  expressed  in  all  five  fetal  mammary  rudiment 
pairs  (Figures  2A-2C).  The  rudiments  at  these  stages  appear  to 
be  very  primitive,  as  there  is  amorphous  structure  at  embryonic 
day  16  (El  6),  while  at  El  8,  the  lumen  has  not  yet  formed  and 
there  is  no  clear  segregation  of  the  luminal  marker  keratin-8 
(K8)  and  the  basal  marker  keratin-14  (K14)  (Figure  2D). 

Sox10+  fetal  mammary  cells  were  recovered  using  flow  cy¬ 
tometry  for  more  detailed  molecular  characterization.  As  cells 
in  the  rudiment  can  be  distinguished  from  surrounding  stromal 
cells  by  the  epithelial  cell  adhesion  marker  (EpCAM),  fetal 
Sox10+  mammary  cells  were  isolated  as  Sox10+;EpCAM+. 
Consistent  with  Figure  2C,  nearly  all  cells  appear  to  be  Sox10+ 
within  the  rudiment  by  fluorescence-activated  cell  sorting 
(FACS)  analysis  (Figure  2E).  It  is  possible  that  the  stability  of 
the  H2B-Venus  fusion  protein  may  yield  cells  that  no  longer 
express  SoxlO  but  still  retain  the  Venus  fluorescence  and 
thus  overrepresent  SoxlO  expression.  To  address  this,  a 
Sox70flox_GFP  mouse  line  in  which  a  less  stable  GFP  reporter  is 
expressed  from  native  SoxlO  transcripts  was  also  analyzed, 
and  we  confirmed  that  the  majority  of  fetal  mammary  cells  are 
Sox10+  (Figure  S3).  Consistent  with  the  Sox70-H2BVenus 
whole-mount  images,  most  single  Sox70flox"GFP  cells  also  co-ex- 
pressed  K8  and  K14,  suggesting  that  they  may  be  bipotent  pro¬ 
genitors  or  stem  cells  (Figure  2F). 

Stem/progenitor  cell  function  in  these  Sox10+  fetal  cells 
was  next  analyzed  using  in  vitro  and  in  vivo  stem/progenitor 
cell  assays.  Single  fMaSCs  grown  in  3D  culture  conditions 
will  clonally  expand  to  generate  bi-lineage  organoids  that 
resemble  the  architecture  of  the  mammary  gland  with  inner 


(G)  Efficiency  of  organoid  formation  from  El  8  Sox70-H2BVenus  female  mammary  rudiments  in  two  different  media,  y  axis  is  number  of  organoids  per  100  cells 
plated. 

(FI)  A  bi-lineage  organoid  derived  from  fMaSCs. 

(I)  A  reconstituted  mammary  gland  following  transplantation  of  Sox10+  fetal  cells  visualized  by  SoxlO- FI2BVenus  reporter. 

(J)  Sox70-FI2BVenus-derived  fMaSCs  (columns  1  and  2),  CD24/CD49f-derived  fMaSCs  (columns  3  and  4),  and  fStroma  (columns  5-7)  were  RNA  sequenced  and 
clustered  (SAM:  FDR  <  0.01%)  using  previously  indicated  differentially  expressed  genes  between  fMaSC  (green)  and  fStroma  (pink). 

Error  bars  represent  SD. 
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full  mammary  gland,  further  indicating 
that  SoxlO  positivity  strongly  correlates 
with  fMaSC  activity  (Figure  21;  Figure  S3). 
Collectively,  the  data  demonstrate  that 
SoxlO  expression  labels  cells  in  the  fetal 
mammary  rudiment  that  possess  bipo- 
tent  stem/progenitor  features. 

Notably,  the  organoid-forming  effi¬ 
ciency  for  fetal  cells  recovered  with 
the  Sox70-Venus  and  EpCAM  markers 
represents  a  >3-fold  improvement  over 
the  original  CD24  and  CD49f  fMaSC 
marker  strategy  we  previously  em¬ 
ployed.  We  isolated  and  RNA-sequenced 
El  7  Sox10+;EpCAM+  fMaSCs  and  their 
surrounding  fetal  stromal  cells  (Table 
SI).  In  parallel,  we  RNA-sequenced 
El  7  fMaSCs  isolated  by  sorting  for 
CD24hl;CD49f+  cells  to  assess  the  purifi¬ 
cation  afforded  by  SoxlO  and  EpCAM. 
Comparison  of  these  transcriptome  pro¬ 
files  revealed  that  numerous  stromal- 
associated  genes  were  removed  from 
the  El 7  fMaSC  profile  by  using  SoxlO 
expression  to  purify  fMaSCs  (Figure  2J). 
Taken  together,  our  data  show  that  using  SoxlO  as  a  marker 
produces  an  fMaSC  population  significantly  purer  than  obtained 
previously. 


SoxlO  Labels  Cells  with  Stem/Progenitor  Features  in 
Adult  Mammary  Tissues 

We  next  analyzed  SoxlO  expression  in  the  adult  mammary 
gland.  Immunofluorescence  against  positional  markers  such 
as  EpCAM  (high  in  luminal  cells,  low  in  basal  cells)  indicated 
that  SoxlO  expression  was  more  restricted  in  the  adult  gland 
compared  to  the  fetal  mammary  rudiment  (Figure  3A).  To 


K8+  luminal  cells  and  external  K14+  basal  cells  (Spike  et  al. , 
2012).  When  El  8  Sox10+  fetal  cells  were  plated  as  single  cells 
into  3D  culture  conditions,  they  robustly  formed  bi-lineage 
organoids  (Figures  2G  and  2H;  Figure  S3).  This  demonstrates 
that  the  Sox10+  El  8  population  contains  bipotent  cells  that 
generate  both  luminal-  and  basal-like  cells.  By  contrast,  the 
more  rare  Sox10ne9  fetal  mammary  cells  formed  spheres  at 
significantly  reduced  efficiency.  As  an  in  vivo  metric  of  stem 
cell  function,  El  8  Sox10+  fetal  cells  were  also  transplanted 
into  cleared  fat  pads  of  immune-compromised  mice.  As 
few  as  five  Sox10+  fetal  cells  were  sufficient  to  generate  a 


SoxIO-Venus  negative  luminal  cells 


EpCAM 


Adult  luminal  cells 


Adult  basal  cells 


SoxIO-Venus  positive  luminal  cells 


Figure  3.  SoxlO  Labels  Cells  with  Stem/Pro¬ 
genitor  Features  in  Adult  Mammary  Tissues 

(A)  Immunostain  for  EpCAM  in  an  adult  SoxlO- 
H2BVenus  mammary  gland. 

(B)  FACS  of  Venus  fluorescence  (x  axis)  in  adult 
SoxlO- H2BVenus  luminal  and  basal  populations 
(y  axis  is  EpCAM).  Displayed  are  luminal  cells  that 
were  pre-gated  as  EpCAMhi;CD49flow-med,  and 
basal  cells  as  EpCAMlow*med;CD49fhi. 

(C)  Venus(-)  or  Venus(+)  luminal  cells  from  an  adult 
SoxlO- H2BVenus  mammary  gland  cultured  in  3D 
for  6  days.  Scale  bar,  65  [im. 

(D)  Whole-mount  immunofluorescence  for  K8  and 
progesterone  receptor  (Pgr)  from  adult  SoxIO- 
H2BVenus  mammary  glands;  right  image  lacks  Pgr 
for  easier  visualization. 

(E)  Transplantation  take  rates  for  Venus(-) 
and  Venus(+)  basal  cells  from  an  adult  SoxIO- 
H2BVenus  mammary  gland. 

(F)  A  reconstituted  mammary  gland  following 
transplantation  of  Sox10+  adult  basal  cells  visual¬ 
ized  by  the  Sox  1 0- H 2 B Venus  reporter. 
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Figure  4.  SoxlO  Functionally  Contributes  to  Stem/Progenitor  Ac¬ 
tivity  in  Mammary  Cells 

(A)  Organoids  from  SoxlO- H2BVenus  fMaSCs  contain  Venus(+)  and  Venus(-) 
cells. 

(B  and  C)  Efficiency  of  secondary  organoid  formation  for  Venus(+)  and 
Venus(-)  cells  taken  from  primary  SoxlO- H2BVenus  fMaSC  organoids  grown 
in  SFM.  y  axis  is  number  of  secondary  organoids  per  100  cells  plated. 

(D)  Representative  organoid  formation  following  3D  culture  of  Cre-infected 
Sox70wild'type  or  Sox70flox/flox  fMaSCs. 

(E)  Carmine  staining  of  transplanted  Cre-infected  Sox70wild-type  or  Sox70flox/flox 
fMaSCs  into  cleared  fat  pads.  Transplants  were  considered  takes  if  greater 
than  half  the  fat  pad  was  reconstituted;  *  marks  a  partial  aborted  outgrowth. 
Error  bars  represent  SD. 


quantify  the  expression  of  SoxlO  by  cell  type,  SoxlO- H2BVenus 
and  Sox70flox_GFP  adult  glands  were  FACS  sorted  into  basal 
and  luminal  fractions  using  EpCAM/CD49f,  and  the  percentage 
of  Sox10+  cells  in  each  fraction  was  then  determined.  These 
analyses  revealed  that  nearly  all  basal  cells  express  SoxlO, 
whereas  ~50%  of  luminal  cells  express  SoxlO  (Figure  3B; 
Figure  S4). 

Mammary  stem/progenitor  cell  assays  were  performed  on 
these  Sox10+  basal  and  luminal  cells  to  better  understand  their 
function  in  the  gland.  Sox10+  and  Sox10ne9  luminal  cells  were 
isolated  by  FACS  and  plated  into  3D  culture  conditions.  While 
Sox10+  luminal  cells  demonstrated  sphere-forming  potential 
with  luminal  characteristics  (18.0  ±  2.1  %),  Sox10ne9  luminal  cells 
did  not  form  spheres  (0.3  ±  0.3%;  Figure  3C;  Figure  S4).  This 
suggests  that  Sox10+  luminal  cells  demarcate  the  colony-form¬ 


ing  luminal  progenitor  cells  in  the  luminal  fraction  of  the  mam¬ 
mary  gland.  Consistent  with  this,  Sox10+  cells  do  not  express 
progesterone  receptor,  a  mature  luminal  cell  marker,  which  is 
instead  exclusively  expressed  in  Sox10ne9  luminal  cells  (Fig¬ 
ure  3D).  In  the  basal  cell  fraction,  both  Soxl  0+  and  less  common 
Sox10ne9  basal  cells  were  transplanted  into  cleared  fat  pads  to 
determine  MaSC  function  in  an  in  vivo  context.  Sox10+  basal 
cells  exhibited  robust  repopulation  potential,  whereas  no  suc¬ 
cessful  transplantation  was  observed  with  Sox10ne9  basal  cells 
(Figures  3E  and  3F).  Sox10+  luminal  cells  also  failed  to  exhibit 
successful  transplantation,  further  indicating  that  these  are  line¬ 
age  restricted  progenitor  cells. 

These  data  indicate  that  populations  with  known  mammary 
stem/progenitor  cell  properties— fMaSCs  in  the  fetal  rudiment, 
repopulating  MaSCs  in  the  adult  basal  fraction,  and  luminal  pro¬ 
genitors  in  the  luminal  layer  of  the  mammary  gland  — all  appear  to 
express  SoxlO. 

SoxlO  Labels  Cultured  Mammary  Cells  with  Stem/ 
Progenitor  Characteristics  In  Vitro 

The  correlation  of  SoxlO  expression  with  mammary  stem/pro¬ 
genitor  populations  in  vivo  led  us  to  next  investigate  if  SoxlO 
also  labels  cells  with  these  properties  in  organoids  grown 
from  fMaSCs  in  vitro.  To  address  this,  Sox70-H2BVenus 
fMaSCs  were  grown  into  bi-lineage  organoids  in  3D  culture 
conditions.  Intriguingly,  these  structures  exhibited  mosaic 
SoxlO  expression  in  which  Sox10+  and  Sox10neg  cells  were 
clearly  evident  (Figure  4A).  To  determine  if  these  cells  differ 
in  stem/progenitor  functionality,  these  populations  were  iso¬ 
lated  and  replated  into  identical  organoid-forming  conditions 
to  generate  secondary  organoids  in  a  classic  surrogate  assay 
of  self-renewal  for  stem  cells.  Notably,  Sox10+  cells  from 
primary  organoids  had  significantly  greater  potential  to  form 
secondary  organoids  than  Sox10neg  cells  (Figure  4B).  Further, 
the  secondary  structures  from  Sox10+  cells  were  larger  and 
yielded  clear  bi-lineage  differentiation  with  both  luminal  and 
basal  cell  types  present  (Figure  4C).  The  rare  secondary  out¬ 
growths  derived  from  Sox10neg  cells  were  by  contrast  smaller 
and  appeared  to  lack  the  bi-lineage  structure  observed  in 
primary  and  Sox10+  secondary  organoids  (Figure  4C).  These 
secondary  organoids  appeared  to  show  more  luminal- 
restricted  SoxlO  expression  compared  to  primary  organoids, 
which  may  reflect  the  restriction  in  stem/progenitor  compe¬ 
tence  that  occurs  in  this  differentiation  medium,  and  may 
mimic  native  mammary  cell  hierarchy.  These  data  indicate 
that  in  addition  to  mammary  cells  in  vivo,  SoxlO  labels  popu¬ 
lations  with  enhanced  stem/progenitor  functions  in  cultured 
mammary  organoids  in  vitro. 

SoxlO  Functionally  Contributes  to  Stem/Progenitor 
Activity  in  Mammary  Cells 

We  next  determined  if  SoxlO  actively  contributes  to  fMaSC 
function  by  performing  stem/progenitor  assays  on  cells  in 
which  SoxlO  expression  was  ablated  by  deletion.  We  infected 
Sox70flox/flox  and  Sox70wild"type  fMaSCs  with  Cre-expressing 
lentivirus  to  delete  SoxlO  from  the  Sox70flox  cells.  While  Cre-in¬ 
fected  Sox70wlld"type  fMaSCs  generated  typical  organoids  with 
luminal  and  basal  architecture  resembling  the  mammary  gland, 
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Figure  5.  Ectopic  SoxlO  Expression  Ex¬ 
pands  Stem/Progenitor  Activity  and  Drives 
Acquisition  of  Mesenchymal  Features 

(A)  Primary  (1°)  organoids  from  control  (uninfected) 
or  Sox10OE  m2rtTA  fMaSCs  were  dissociated  and 
replated  into  3D  culture  to  form  secondary  (2°)  or¬ 
ganoids.  Shown  is  2°  organoid  growth  after  7  days. 
Scale  bar,  75  [im. 

(B)  Quantification  of  2°  organoid-forming  potential 

for  Sox10OE  cells  compared  to  uninfected  or 
Venus-only-infected  cells,  y  axis  is  #  of  >50  2° 

organoids  per  100  cells  plated. 

(C)  Sox10OE  fMaSCs  present  with  satellite  single 
cell  structures  surrounding  the  1°  organoid  (*). 
Scale  bar,  40  ^m. 

(D)  Active  delamination  of  cells  from  a  Sox10OE 
organoid. 

(E)  Immunostains  of  control  or  Sox10OE  fMaSC 
organoids  demonstrate  the  loss  of  keratin 
expression  (red  or  green)  in  Sox10OE  cells  (blue). 
Scale  bar,  50  [im. 

(F)  Immunostains  of  Sox10OE  fMaSC  organoids 
reveal  upregulation  of  vimentin  and  loss  of  E-cad- 
herin  in  Sox10OE  cells  (blue). 

Error  bars  represent  SD. 
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the  Cre-infected  Sox70flox/flox  fMaSCs  generated  fewer  organo- 
ids,  and  the  structures  that  did  form  were  typically  smaller  and 
failed  to  develop  the  morphological  features  of  multi-lineage  or¬ 
ganoids  (Figure  4D;  Figure  S 5). 

We  also  performed  transplantation  assays  with  Cre-infected 
Sox70flox/flox  fMaSCs  or  Sox70flox/flox  adult  basal  cells  to  deter- 
mine  if  cells  were  capable  of  generating  full  outgrowths 
following  SoxlO  deletion.  No  full  outgrowths  following  trans¬ 
plantation  were  observed  in  the  Sox10nuN  MaSCs,  whereas 
equivalent  numbers  of  control  cells  exhibited  successful 
transplantation  (Figure  4E;  Figure  S5).  Together,  these  data 
indicate  that  SoxlO  is  required  for  full  stem/progenitor  cell 
functionality. 

To  determine  if  overexpression  of  SoxlO  can  increase  stem/ 
progenitor  function  in  mammary  cells,  the  Tet-on  system  was 
used  to  drive  expression  of  human  SoxlO  in  fMaSCs.  fMaSCs 
isolated  from  a  mouse  strain  that  ubiquitously  expresses  the 
m2rtTA  reverse  tetracycline  transactivator  were  infected  with 
either  LV-TRE-hSox10-2A-NLSVenus  (doxycyline  [dox]  in¬ 
duces  expression  of  SoxlO  and  Venus)  or  LV-TRE-NLSVenus 
(dox  induces  expression  only  of  Venus)  and  allowed  to  form 
primary  organoids.  No  apparent  increase  in  primary  organoid 
formation  was  observed  with  SoxlO  overexpression  (Sox10OE). 
These  primary  organoids  were  then  dissociated  to  single 


cells,  replated  into  identical  culture 
conditions,  and  scored  for  their  ability 
to  generate  secondary  organoids  as  a 
metric  for  increased  persistence  of 
stem/progenitor  function.  While  fMaSCs 
that  did  not  overexpress  SoxlO  showed 
low  ability  to  form  secondary  organoids 
in  differentiation  medium  (Figure  5A), 
Sox10OE  fMaSCs  now  demonstrated 
robust  secondary  organoid  formation  (Figures  5A  and  5B). 
These  data  indicate  that  ectopic  expression  of  SoxlO  is  able 
to  increase  or  sustain  stem/progenitor  competence  in  cultured 
fetal  mammary  cells. 

Ectopic  SoxlO  Expression  Drives  an  EMT-like  Response 
in  fMaSC-Derived  Organoids 

While  measuring  the  stem/progenitor  function  of  Sox10OE  cells, 
we  discovered  that  primary  organoids  with  Sox10OE  cells 
demonstrated  a  novel  morphology  in  which  the  primary  orga¬ 
noid  was  surrounded  by  individual  cells  (Figure  5C).  Video 
microscopy  showed  that  the  satellite  cells  originate  from  the 
delamination  and  extrusion  of  Sox10OE  cells  from  the  primary 
organoid  (Figure  5D;  Movies  SI  and  S2).  We  found  that 
Sox10OE  (Venus+)  cells  no  longer  expressed  keratin  markers, 
suggesting  that  the  mobility  of  the  cells  might  result  from 
Sox10OE-induced  EMT  (Figure  5E;  Figure  S6).  Sox10OE  cells 
also  presented  with  additional  EMT  markers,  including  downre- 
gulated  expression  of  E-cadherin  and  upregulated  expression 
of  vimentin  (Figure  5F;  Figure  S6).  No  such  changes  were 
observed  in  organoids  not  exposed  to  dox.  These  data  demon¬ 
strate  that  SoxlO  can  directly  mediate  an  EMT-like  response 
when  forcibly  expressed  at  high  levels  in  fMaSC-derived 
organoids. 
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We  next  determined  if  the  EMT  state  could  be  reversed  in 
Sox10OE  mammary  cells  and  if  they  retained  or  could  regain 
bipotential  stem/progenitor  function.  Sox10OE  mammary  cells 
were  isolated  from  primary  organoid  cultures  and  replated  into 
3D  culture  conditions  with  or  without  dox.  The  Sox10OE  mam¬ 
mary  cells  that  were  plated  into  dox,  and  thus  maintained  high 
SoxlO  expression,  often  persisted  as  single  cells  and  did  not 
organize  into  secondary  organoids  (Figure  6A).  However,  when 
these  same  cells  were  plated  into  dox-free  media,  and  SoxlO 
levels  were  reduced  to  baseline  (Figure  S7),  the  cells  now 
favored  the  formation  of  bi-lineage  secondary  organoids 
(Figure  6A). 

The  same  phenomenon  was  observed  when  Sox10OE  orga¬ 
noids  that  had  undergone  EMT  and  cell  delamination  were 
subjected  to  a  protocol  that  removed  dox  from  the  media 
and  lowered  SoxlO  expression  to  basal  levels.  While  orga¬ 
noids  continuously  exposed  to  dox  and  high  SoxlO  levels 


Figure  6.  Reversal  of  T ransient  Soxl  0  Over¬ 
expression  Restores  Epithelial  Features  and 
Promotes  Stem/Progenitor  Activity 

(A)  Sox10OE  cells  were  isolated  from  7-day-old 
fMaSC-derived  primary  (1°)  organoids  and  re¬ 
plated  in  3D  culture  ±  dox.  Secondary  outgrowths 
from  these  cells  were  immunostained  for  keratin 
markers  after  7  days. 

(B)  Sox10OE  satellite  cells  form  secondary  (2°)  or¬ 
ganoids  surrounding  the  1°  organoid  at  greater 
efficiency  if  dox  is  removed  from  the  media  after 
4  days.  Left/right  are  the  same  organoids  over 
10  days  of  culture.  Scale  bar,  20  ^m. 

(C)  Sox10OE  cells  were  allowed  to  form  1°  orga¬ 
noids  in  3D  culture  for  7  days,  then  dox  was 
washed  out  of  the  media  to  ease  SoxlO  expres¬ 
sion.  3-4  days  after  washout,  the  delaminated 
satellite  cells  initiated  2°  organoid  formation  (*) 
around  the  1°  organoid. 


showed  mostly  persistent  single-cell 
satellite  structures,  the  satellite  cells  in 
the  dox-withdrawn  organoids  now  initi¬ 
ated  the  formation  of  localized  second¬ 
ary  organoids  (Figure  6B).  These  sec¬ 
ondary  organoids  exhibited  the  same 
bi-lineage  features  of  primary  fMaSC 
organoids,  indicating  that  these  single 
Sox10OE  cells  have  the  potential  to  pro¬ 
duce  both  luminal-  and  basal-like  cells 
(Figure  6C).  Notably,  this  robust  sec¬ 
ondary  organoid  formation  occurred  in 
the  same  strong  differentiation  media 
in  which  cells  with  retained  stem/ 
progenitor  qualities  are  rare  (Figure  4B), 
indicating  the  downstream  effects  of 
SoxlO  serve  to  counterbalance  these 
pro-differentiation  factors. 

These  data  reveal  that  at  high  levels 
of  expression,  SoxlO  induces  a  mesen¬ 
chymal  transition  that  enables  cell 
migration  away  from  primary  organoids.  These  cells  are  then 
capable  of  undergoing  a  mesenchymal-epithelial  transition 
(MET)  that  mediates  the  formation  of  secondary  organoids, 
which  appears  to  be  favored  when  SoxlO  expression  levels 
are  reduced. 

FGF  Signaling  Is  Required  for  SoxIO-Induced  Cell 
Motility 

We  next  attempted  to  identify  mechanisms  through  which 
SoxlO  evokes  stem/progenitor  and  EMT/motility  functions  in 
mammary  cells.  The  feedback  loop  between  Sox  transcrip¬ 
tion  factors  and  FGF  signaling  that  appears  to  involve  SoxlO 
and  FGF10  in  mammary  cells  (Figure  1)  suggests  that  these 
SoxIO-mediated  cell  functions  could  involve  FGF  signaling.  To 
test  this,  fMaSCs  were  manipulated  to  overexpress  SoxlO  as 
before,  but  this  time  in  the  presence  of  FGFRi.  As  expected, 
fMaSCs  that  were  given  vehicle  formed  primary  organoids 
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and  the  overexpression  of  Soxl  0  elicited  an  EMT-like  delamina- 
tion  of  cells  (Figure  7  A).  However,  this  cell  delamination  was 
significantly  attenuated  in  organoids  that  were  exposed  to  the 
FGFRi,  as  indicated  by  the  absence  of  satellite  cells  surrounding 
the  primary  organoid  (Figures  7A  and  7B).  Sox10OE  organoids 
that  were  grown  in  media  without  FGF  also  failed  to  extrude  sat¬ 
ellite  cells,  confirming  that  it  is  inhibition  of  FGF  signaling  by  the 
FGFRi  that  mediates  this  effect  (Figure  7C).  These  data  suggest 
that  the  potentiation  of  FGF  signaling  can  be  one  effector  of 
SoxlO  that  mediates  cell  delamination  and  that  a  pan-FGFRi 
blocks  SoxIO-induced  motility  in  fMaSC-derived  mammary 
organoids. 


Figure  7.  FGF  Signaling  Is  Required  for 
SoxIO-induced  Cell  Motility 

(A)  Sox10OE  organoids  were  grown  in  3D  culture  in 
the  presence  of  vehicle  or  1 .0  [iM  FGFRi.  Scale  bar, 
100  [im. 

(B)  Fraction  of  Sox10OE  organoids  with  extruded 
satellite  cells  after  6  days  (y  axis)  in  the  presence  of 
vehicle  or  1.0  [xM  FGFRi. 

(C)  Sox10OE  organoids  were  grown  in  3D  culture  in 
SFM  with  EGF  alone  or  EGF,  FGF2,  and  FGF10. 
Scale  bar,  40  [xm. 

(D)  Gene  Ontology  terms  associated  with  signifi¬ 
cantly  down-  or  upregulated  genes  following 
Sox10OE  (top)  and  example  notable  genes  with 
altered  expression  by  Sox10OE  (bottom). 

Error  bars  represent  SD. 


SFM  +  EGF  +  FGF  nF 

Transcriptome  Analyses  of  SoxlO 

Cells  Indicate  Potential  Mediators 
of  Stem  and  EMT  Functions 

To  more  comprehensively  profile  the 
state  changes  elicited  by  SoxlO  and  to 
identify  other  potential  direct  or  indirect 
targets  of  SoxlO  that  could  mediate  the 
stem/progenitor  and  EMT-like  functions 
of  SoxlO,  we  performed  transcriptome 
profiling  of  Sox10OE  cells  through  RNA 
sequencing  (Table  S2).  In  parallel,  we 
also  isolated  and  RNA-sequenced  control 
organoid  cells  that  did  not  overexpress 
SoxlO  for  comparison.  To  assess  the 
quality  of  the  sequencing  data,  we  deter¬ 
mined  if  previously  described  targets 
of  SoxlO  were  upregulated  in  response 
to  SoxlO  overexpression.  Published  tar¬ 
gets  such  as  Mitf,  Mia,  and  ErbB3  all 
showed  elevated  expression  in  Sox10OE 
cells  (Bondurand  et  al.,  2000;  Graf  et  al., 
2014;  Prasad  et  al.,  2011)  (Figure  7D). 
We  also  analyzed  targets  of  FGF 
signaling,  given  our  data  linking  SoxlO 
and  FGF  signaling.  Among  the  targets 
induced  by  SoxlO,  we  found  that  the 
FGF-positive  signaling  regulator  Etv5 
was  upregulated,  while  the  FGF  negative 
regulator  Dusp6  was  downregulated 
(Figure  7D).  This  is  consistent  with  the  positive  FGF-SoxlO 
loop  indicated  by  our  data,  in  which  FGF  acts  to  induce  SoxlO, 
while  activated  SoxlO  then  reinforces  FGF  signaling.  These 
data  validate  that  the  differential  expression  of  molecules  be¬ 
tween  Sox10OE  and  control  cells  can  be  used  to  identify  targets 
of  SoxlO  or  signaling  network  changes  initiated  by  SoxlO. 

We  next  identified  genes  that  were  significantly  differentially 
expressed  in  response  to  Sox10OE.  Gene  ontology  analysis 
with  these  gene  lists  indicated  significant  reprogramming  of 
cellular  function  that  is  consistent  with  the  observed  phenotypic 
changes  in  Sox10OE  cells  (Figure  7D;  Table  S2).  For  example, 
Sox10OE  cells  delaminate  from  the  primary  organoid  where 
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they  tend  to  remain  quiescent,  and  indeed  this  analysis  finds 
genes  associated  with  migration  are  upregulated  with  Sox10OE, 
while  genes  associated  with  proliferation  and  adhesion  are 
downregulated  with  Soxl  0OE.  Similarly,  Soxl  0OE  cells  in  organo¬ 
ids  lose  differentiation  marker  expression  and  gain  stem/progen¬ 
itor  function  during  this  process,  and  indeed  genes  associated 
with  differentiation  are  downregulated  with  Soxl  0OE.  These  tran¬ 
scription  data  thus  provide  a  hypothesis  generating  resource  to 
determine  how  SoxlO  elicits  important  state  changes  in  normal 
or  transformed  mammary  cells. 

Notably,  ErbB2  and  the  estrogen  and  progesterone  hormone 
receptors  all  showed  reduced  expression  levels  following  SoxlO 
overexpression.  SoxlO  is  preferentially  expressed  in  triple¬ 
negative  breast  cancers  that  lack  these  three  receptors  (Fig¬ 
ure  1 F).  These  data  suggest  that  Soxl  0  may  be  one  mechanism 
of  functionally  specifying  this  triple-negative  state. 

DISCUSSION 

Our  studies  have  used  diverse  strategies  to  reveal  important 
roles  for  SoxlO  in  stem  and  progenitor  functions  within  mam¬ 
mary  cells.  This  is  first  indicated  by  the  significant  correlation  be¬ 
tween  Soxl  0  expression  and  two  aggressive  subtypes  of  breast 
cancer  that  have  previously  been  described  as  stem-like  (basal- 
like)  or  EMT-like  (claudin-low).  We  then  present  data  that  Soxl  0 
consistently  labels  cells  with  stem/progenitor  qualities  in  multiple 
contexts  that  include  fetal,  adult,  and  3D  cultured  mammary 
tissues.  SoxlO  may  be  a  cell  state  regulatory  node  in  mammary 
cells,  as  deleting  SoxlO  decreased  stem/progenitor  functions, 
while  its  ectopic  activation  both  expanded  stem/progenitor  ac¬ 
tivity  and  induced  EMT.  This  suggests  that  relative  expression 
levels  of  SoxlO  can  mediate  either  stem-like  or  EMT-like  re¬ 
sponses  depending  on  context. 

The  link  between  SoxlO  and  both  stem-  and  EMT-like  cell 
functions  is  reminiscent  of  the  published  links  between  CSCs 
and  EMT  (Oskarsson  et  alL  2014).  Importantly,  it  has  been  un¬ 
clear  to  what  extent  CSCs  are  stem-like,  given  that  their  mesen¬ 
chymal  properties  and  transcriptome  profiles  often  do  not 
resemble  those  of  bone  fide  stem  cells.  The  enhanced  motility 
of  mesenchymalized  cells  may  endow  them  with  greater  capac¬ 
ity  to  aggregate  and  form  polyclonal  “tumorspheres”  in  suspen¬ 
sion  cultures  or  to  invade  and  form  tumors  more  efficiently  in 
xenograft  assays.  These  properties  are  clearly  independent  of 
sternness  measured  by  transcription  profiling,  and  should  not 
be  used  as  surrogates  for  stem  cell  function.  These  concerns 
have  led  to  the  rebranding  of  CSCs  as  “tumor- “  or  “xenograft- 
initiating  cells,”  which  suggests  the  distinction  between  the 
stem-like  cells  in  tumors  identified  transcriptionally,  and  the 
more  EMT-like  CSCs. 

The  data  described  here  present  clear  evidence  that  the  stem 
cell  and  mesenchymal  states  are  related  and  can  be  intercon- 
verted  in  stem-like  cells.  We  find  that  a  single  factor,  SoxlO,  is 
able  to  contribute  to  cells  entering  each  of  these  two  states, 
and  critically,  we  show  that  it  does  so  independently  of  the 
other  state.  Sox10+  cells  that  have  not  undergone  EMT  show 
increased  levels  of  sternness  in  multiple  contexts,  while  EMT  oc¬ 
curs  independent  of  stem  cell  activity.  The  separation  of  these 
states  removes  the  aforementioned  concerns  about  conflating 


sternness  with  properties  of  mesenchymal  cells,  and  demon¬ 
strates  that  a  single  molecule  such  as  SoxlO  can  link  these 
two  distinct  states.  Importantly,  this  affirms  the  link  between 
stem-like  and  mesenchymal  states  and  defines  a  molecular 
mechanism  by  which  these  state  conversions  can  take  place. 

These  data  also  yield  predictions  about  how  mammary  cells 
acquire  stem  cell-like  properties  in  normal  and  cancerous  states 
and  how  these  mechanisms  may  contribute  to  metastatic  dis¬ 
ease.  The  capacity  of  SoxlO  to  promote  both  stem-like  and 
EMT-like  behaviors  suggests  that  SoxlO  could  be  a  factor  that 
mediates  these  two  functions  that  are  hypothesized  to  be 
directly  responsible  for  tumor  initiation  and  progression.  Most 
notably,  we  have  modeled  the  sequential  stages  of  metastatic 
behavior  using  only  SoxlO  in  3D  mammary  cell  culture,  as  we 
find  that  (1)  Sox10+  cells  preferably  form  primary  organoids,  (2) 
Sox10OE  activates  EMT  to  elicit  delamination  and  migration  of 
cells  away  from  the  primary  organoid,  and  (3)  reduction  of 
SoxlO  levels  in  these  cells  reverses  the  EMT  and  initiates  the 
establishment  of  separate  organoids  at  secondary  sites.  It  is 
easy  to  visualize  how  this  could  similarly  play  out  in  Sox10+  tu¬ 
mors,  in  which  microenvironmental  or  genomic  changes  could 
induce  fluctuations  in  SoxlO  expression  levels  that  cycle  cells 
through  these  stem-like  and  EMT  states  to  mediate  metastasis. 

Our  findings  also  have  implications  for  how  stem/progenitor 
cell  states  may  be  specified  in  mammary  cells.  As  discussed  in 
the  introduction,  the  balanced  activation  of  specific  lineage 
determining  factors  is  a  mechanism  capable  of  mediating 
stem-like  functions  in  cells.  This  model  fits  with  observations  of 
Sox  family  transcription  factors,  where  Sox  molecules  have 
antagonistic  relationships  with  other  factors  at  cell-fate  decision 
points.  By  applying  this  model  to  Soxl  0  and  mammary  cells,  our 
data  indicate  that  SoxlO  may  specify  the  basal  lineage  in  mam¬ 
mary  cells.  This  is  apparent  in  the  expression  data,  where  Soxl  0 
preferentially  labels  the  basal  cell  fraction  in  the  adult  mammary 
gland,  and  the  functional  data,  as  Sox10OE  can  elicit  EMT  in 
mammary  cells,  and  basal  cells  can  be  considered  “partial 
EMT”  based  on  their  morphology.  Furthermore,  this  model 
predicts  that  SoxlO  should  promote  stem-like  qualities  when  in 
balance  with  other  factors.  This  is  supported  by  our  data  linking 
SoxlO  expression  and  function  to  stem-like  properties  and 
our  data  demonstrating  that  lower  levels  of  SoxlO  expression 
increase  efficiency  of  bi-lineage  sphere  formation  and  self¬ 
renewal.  These  data  thus  support  a  model  in  which  cell-fate 
decisions  and  sternness  in  mammary  cells  are  regulated  by  a 
balance  of  lineage  specifiers,  of  which  SoxlO  is  one  critical 
player  that  favors  a  basal  lineage.  However,  there  are  pieces  of 
our  data  that  do  not  neatly  fit  this  model,  such  as  that  Sox10ne9 
cells  produce  mostly  basal-like  organoids  and  Sox10OE  elicits 
cells  that  appear  less  differentiated.  This  suggests  that  a  function 
of  SoxlO  may  be  to  provide  cell-state  plasticity,  instead  of,  or  in 
addition  to,  a  role  in  lineage  specification. 

As  described  in  the  Introduction,  there  is  not  a  consensus  on 
the  localization  and  frequency  for  MaSCs.  Our  data  and  the 
balanced  lineage  specifier  model  suggest  that  a  significant 
reservoir  of  Soxl  0-expressing  poised  basal  cells  exists  and 
that  these  cells  could  adopt  activated  stem/progenitor  cell 
properties  by  the  acquisition  of  antagonistic  factors  that  bring 
SoxlO  levels  into  an  equilibrium  that  favors  a  stem  cell  state. 
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This  is  consistent  with  work  that  indicates  the  majority  of  single 
basal  cells  have  the  potential  to  generate  full  mammary  glands 
(Prater  et  al.,  2014).  Evaluating  this  model  will  require  a  better 
understanding  of  how  SoxlO  works  in  concert  with  other,  pre¬ 
sumably  pro-luminal  factors,  such  as  Elf5,  Gata3,  and  Notch 
signaling,  among  others.  Similarly,  it  will  be  key  to  evaluate 
the  relationship  of  SoxlO  with  basal  lineage  regulators  such 
as  p63  and  Slug  and  the  stem-cell  marker  Lgr5  (Oakes  et  al., 
2014). 

Finally,  two  of  our  most  striking  results  are  that  the  use  of  an 
FGFR  inhibitor  profoundly  affects  the  expression  of  SoxlO  and 
the  delamination  phenotype  induced  through  Sox10OE.  Notably, 
the  deletion  of  FGFR1  and  FGFR2  results  in  the  loss  of  the 
transplantation  competent  population  of  mammary  stem  cells 
and  compromises  ductal  remodeling,  which  mirror  the  roles  for 
SoxlO  in  stem  cell  competence  and  cell  motility  shown  here 
(Pond  et  al.,  2013).  Extrinsic  signaling  mechanisms  in  the  stem 
cell  niche  that  regulate  the  frequency  and  output  of  stem  cells 
are  potential  targets  for  cancer  prevention  or  treatment.  Thus, 
it  will  be  key  to  determine  if  blocking  FGF  signaling  also  antago¬ 
nizes  the  expression  or  downstream  effects  of  SoxlO  (or  other 
Sox  family  transcription  factors)  in  vivo  in  normal  mammary 
tissue  or  tumors.  Together,  these  data  imply  a  central  role  for 
FGF  signaling  and  SoxlO  in  normal  mammary  function  and  indi¬ 
cate  that  tight  control  is  required  to  prevent  it  from  eliciting 
malignant  functions. 

EXPERIMENTAL  PROCEDURES 
Mice 

Mice  were  housed  in  accordance  with  NIH  guidelines  in  Association  for 
Assessment  and  Accreditation  of  Laboratory  Animal  Care  (AAALAC)-ac- 
credited  facilities  at  the  Salk  Institute.  All  experimental  protocols  were 
approved  by  the  Salk  Institute  Institutional  Animal  Care  and  Use  Committee. 

Mammary  Cell  Preparation 

Single-cell  preparations  of  fetal  mammary  cells  were  obtained  by  pooling 
freshly  dissected  fetal  mammary  rudiments  from  euthanized  embryos  into 
dissociation  media  (Epicult-B  Basal  medium  [STEMCELL  Technologies]  sup¬ 
plemented  with  5%  fetal  bovine  serum  [FBS],  penicillin/streptomycin,  fungi¬ 
zone,  hydrocortisone,  collagenase,  and  hyaluronidase).  Rudiments  were 
then  dissociated  to  single  cells  by  sequentially  incubating  them  in  dissociation 
medium  for  1 .5  hr  at  37°C  with  gentle  agitation,  exposing  them  to  ammonium 
chloride  for  4  min  on  ice  to  remove  erythrocytes  and  triturating  them  with 
dispase  and  DNase.  Final  suspensions  were  passed  through  a  40-^m  filter 
to  remove  aggregated  cells  and  stored  in  Hank’s  balanced  salt  solution  with 
2%  FBS  for  flow  cytometry.  Single-cell  preparations  of  adult  mammary  cells 
were  made  by  dissecting  out  and  mincing  the  #4  mammary  glands  from 
6-  to  12-week-old  virgin  female  mice.  Glands  were  then  dissociated  by 
agitating  them  for  3-6  hr  at  37°C  in  the  same  dissociation  media.  Cells  were 
further  processed  as  with  the  fetal  cells,  except  that  trypsin  and  Accutase 
(Life  Technologies)  were  also  utilized  prior  to  dispase  treatment  to  facilitate 
disaggregation.  Final  suspensions  were  passed  through  a  40-[im  filter  to  re¬ 
move  cell  clusters  and  stored  in  Hank’s  balanced  salt  solution  with  2%  FBS 
for  flow  cytometry. 

Immunostaining  and  Confocal  Analyses 

Mammary  tissues  were  immunostained  through  direct  or  indirect  immuno¬ 
fluorescence.  Confocal  microscopy  was  performed  with  equipment  from  the 
Waitt  Advanced  Biophotonics  Center  at  the  Salk  Institute,  including  Zeiss  780 
inverted  laser  scanning  confocal  microscopes.  Details  of  tissue  preparation 
and  staining  protocol  are  included  in  Supplemental  Experimental  Procedures. 


3D  Organoid  Culture 

To  generate  organoids,  single  mammary  cells  were  plated  at  50-650 
cells  per  well  in  96-well  ultra  low-adhesion  plates  (Costar)  with  Matrigel. 
Cells  were  plated  in  either  restricted  serum-free  media  (Epicult-B  media 
with  B-supplement  [STEMCELL  Technologies]  containing  heparin  and 
penicillin/streptomycin  and  defined  growth  factors  such  as  EGF,  FGF2, 
and/or  FGF10)  or  in  serum-based  MCF10A  media  (DMEM/F12  with  5% 
horse  serum,  hydrocortisone,  cholera  toxin,  insulin,  and  ciproflaxin,  sup¬ 
plemented  with  B27  supplement  and  EGF).  Description  of  the  plating 
protocol  and  analysis  of  these  cells  is  in  Supplemental  Experimental 
Procedures. 

4D  Organoid  Culture  and  Imaging 

m2rtTAfMaSCs  were  infected  with  LV-TRE-hSox10-2A-NLSVenus  and  plated 
onto  glass-bottom  35-mm  dishes  with  a  Matrigel  bed  in  restricted  serum-free 
media.  After  72  hr,  organoids  were  given  fresh  media  and  dox  to  induce  Soxl  0/ 
Venus  expression.  8-24  hr  later,  cells  were  imaged  at  10-min  intervals  with  a 
Zeiss  CSU  Spinning  Disk  Confocal  Microscope  in  a  climate-controlled  envi¬ 
ronment  of  5%  C02  and  37°C.  Images  were  assembled  into  movies  using 
Imaris  imaging  software. 

RNA  Sequencing  and  Bioinformatic  Analyses 

RNA  isolation,  sequencing,  and  analysis  are  described  in  detail  in  Supple¬ 
mental  Experimental  Procedures. 

Statistical  Analyses 

A  two-tailed  Student’s  t  test  was  used  to  quantify  significance,  p  values  were 
represented  as  follows:  *p  <  0.05,  **p  <  0.005,  ***p  <  0.0001 . 

Additional  experimental  procedures  are  described  in  Supplemental  Experi¬ 
mental  Procedures. 
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